vllm/vllm/model_executor/models
2024-02-29 00:51:48 -08:00
..
neuron [Neuron] Support inference with transformers-neuronx (#2569) 2024-02-28 09:34:34 -08:00
__init__.py Support starcoder2 architecture (#3089) 2024-02-29 00:51:48 -08:00
baichuan.py [Minor] Remove unused config files (#3039) 2024-02-26 17:25:22 -08:00
bloom.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
chatglm.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
decilm.py chore(vllm): codespell for spell checking (#2820) 2024-02-21 18:56:01 -08:00
deepseek.py Add fused top-K softmax kernel for MoE (#2769) 2024-02-05 17:38:02 -08:00
falcon.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
gemma.py Add LoRA support for Gemma (#3050) 2024-02-28 13:03:28 -08:00
gpt2.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
gpt_bigcode.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
gpt_j.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
gpt_neox.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
internlm2.py Revert "Refactor llama family models (#2637)" (#2851) 2024-02-13 09:24:59 -08:00
llama.py Add LoRA support for Gemma (#3050) 2024-02-28 13:03:28 -08:00
mixtral_quant.py Add quantized mixtral support (#2673) 2024-01-30 16:34:10 -08:00
mixtral.py Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880) 2024-02-15 01:00:43 +01:00
mpt.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
olmo.py [Minor] Remove unused config files (#3039) 2024-02-26 17:25:22 -08:00
opt.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
orion.py Support Orion model (#2539) 2024-02-26 19:17:06 -08:00
phi.py Address Phi modeling update 2 (#2428) 2024-01-12 12:16:49 -08:00
qwen2.py fix names and license for Qwen2 (#2589) 2024-01-24 22:37:51 -08:00
qwen.py [Minor] Remove unused config files (#3039) 2024-02-26 17:25:22 -08:00
stablelm.py Fix stablelm (#3038) 2024-02-26 18:31:10 -08:00
starcoder2.py Support starcoder2 architecture (#3089) 2024-02-29 00:51:48 -08:00