| .. |
|
__init__.py
|
Support OLMo models. (#2832)
|
2024-02-18 21:05:15 -08:00 |
|
baichuan.py
|
Revert "Refactor llama family models (#2637)" (#2851)
|
2024-02-13 09:24:59 -08:00 |
|
bloom.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
chatglm.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
decilm.py
|
Fix DeciLM (#2883)
|
2024-02-14 22:29:57 -08:00 |
|
deepseek.py
|
Add fused top-K softmax kernel for MoE (#2769)
|
2024-02-05 17:38:02 -08:00 |
|
falcon.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
gpt2.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
gpt_bigcode.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
gpt_j.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
gpt_neox.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
internlm2.py
|
Revert "Refactor llama family models (#2637)" (#2851)
|
2024-02-13 09:24:59 -08:00 |
|
llama.py
|
Fix internlm after https://github.com/vllm-project/vllm/pull/2860 (#2861)
|
2024-02-13 18:01:15 -08:00 |
|
mistral.py
|
Add LoRA support for Mixtral (#2831)
|
2024-02-14 00:55:45 +01:00 |
|
mixtral_quant.py
|
Add quantized mixtral support (#2673)
|
2024-01-30 16:34:10 -08:00 |
|
mixtral.py
|
Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880)
|
2024-02-15 01:00:43 +01:00 |
|
mpt.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
olmo.py
|
Support OLMo models. (#2832)
|
2024-02-18 21:05:15 -08:00 |
|
opt.py
|
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221)
|
2024-01-03 11:30:22 -08:00 |
|
phi.py
|
Address Phi modeling update 2 (#2428)
|
2024-01-12 12:16:49 -08:00 |
|
qwen2.py
|
fix names and license for Qwen2 (#2589)
|
2024-01-24 22:37:51 -08:00 |
|
qwen.py
|
Revert "Refactor llama family models (#2637)" (#2851)
|
2024-02-13 09:24:59 -08:00 |
|
stablelm.py
|
Revert "Refactor llama family models (#2637)" (#2851)
|
2024-02-13 09:24:59 -08:00 |