| .. |
|
neuron
|
[Neuron] Support inference with transformers-neuronx (#2569)
|
2024-02-28 09:34:34 -08:00 |
|
__init__.py
|
Support starcoder2 architecture (#3089)
|
2024-02-29 00:51:48 -08:00 |
|
baichuan.py
|
[Minor] Remove unused config files (#3039)
|
2024-02-26 17:25:22 -08:00 |
|
bloom.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
chatglm.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
decilm.py
|
chore(vllm): codespell for spell checking (#2820)
|
2024-02-21 18:56:01 -08:00 |
|
deepseek.py
|
Add fused top-K softmax kernel for MoE (#2769)
|
2024-02-05 17:38:02 -08:00 |
|
falcon.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
gemma.py
|
Add LoRA support for Gemma (#3050)
|
2024-02-28 13:03:28 -08:00 |
|
gpt2.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
gpt_bigcode.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
gpt_j.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
gpt_neox.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
internlm2.py
|
Revert "Refactor llama family models (#2637)" (#2851)
|
2024-02-13 09:24:59 -08:00 |
|
llama.py
|
Add LoRA support for Gemma (#3050)
|
2024-02-28 13:03:28 -08:00 |
|
mixtral_quant.py
|
Add quantized mixtral support (#2673)
|
2024-01-30 16:34:10 -08:00 |
|
mixtral.py
|
Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880)
|
2024-02-15 01:00:43 +01:00 |
|
mpt.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
olmo.py
|
[Minor] Remove unused config files (#3039)
|
2024-02-26 17:25:22 -08:00 |
|
opt.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
orion.py
|
Support Orion model (#2539)
|
2024-02-26 19:17:06 -08:00 |
|
phi.py
|
Address Phi modeling update 2 (#2428)
|
2024-01-12 12:16:49 -08:00 |
|
qwen2.py
|
fix names and license for Qwen2 (#2589)
|
2024-01-24 22:37:51 -08:00 |
|
qwen.py
|
[Minor] Remove unused config files (#3039)
|
2024-02-26 17:25:22 -08:00 |
|
stablelm.py
|
Fix stablelm (#3038)
|
2024-02-26 18:31:10 -08:00 |
|
starcoder2.py
|
Support starcoder2 architecture (#3089)
|
2024-02-29 00:51:48 -08:00 |