vllm/vllm/model_executor/models
2024-02-18 21:05:15 -08:00
..
__init__.py Support OLMo models. (#2832) 2024-02-18 21:05:15 -08:00
baichuan.py Revert "Refactor llama family models (#2637)" (#2851) 2024-02-13 09:24:59 -08:00
bloom.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
chatglm.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
decilm.py Fix DeciLM (#2883) 2024-02-14 22:29:57 -08:00
deepseek.py Add fused top-K softmax kernel for MoE (#2769) 2024-02-05 17:38:02 -08:00
falcon.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
gpt2.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
gpt_bigcode.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
gpt_j.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
gpt_neox.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
internlm2.py Revert "Refactor llama family models (#2637)" (#2851) 2024-02-13 09:24:59 -08:00
llama.py Fix internlm after https://github.com/vllm-project/vllm/pull/2860 (#2861) 2024-02-13 18:01:15 -08:00
mistral.py Add LoRA support for Mixtral (#2831) 2024-02-14 00:55:45 +01:00
mixtral_quant.py Add quantized mixtral support (#2673) 2024-01-30 16:34:10 -08:00
mixtral.py Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880) 2024-02-15 01:00:43 +01:00
mpt.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
olmo.py Support OLMo models. (#2832) 2024-02-18 21:05:15 -08:00
opt.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
phi.py Address Phi modeling update 2 (#2428) 2024-01-12 12:16:49 -08:00
qwen2.py fix names and license for Qwen2 (#2589) 2024-01-24 22:37:51 -08:00
qwen.py Revert "Refactor llama family models (#2637)" (#2851) 2024-02-13 09:24:59 -08:00
stablelm.py Revert "Refactor llama family models (#2637)" (#2851) 2024-02-13 09:24:59 -08:00