vllm/models at 29a8d6a554a87292f05b62078976b43a899691e3 - vllm

Seonghyeon bfdcfa6a05 Support starcoder2 architecture (#3089 )	2024-02-29 00:51:48 -08:00
..
neuron	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
__init__.py	Support starcoder2 architecture (#3089 )	2024-02-29 00:51:48 -08:00
baichuan.py	[Minor] Remove unused config files (#3039 )	2024-02-26 17:25:22 -08:00
bloom.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
chatglm.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
decilm.py	chore(vllm): codespell for spell checking (#2820 )	2024-02-21 18:56:01 -08:00
deepseek.py	Add fused top-K softmax kernel for MoE (#2769 )	2024-02-05 17:38:02 -08:00
falcon.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
gemma.py	Add LoRA support for Gemma (#3050 )	2024-02-28 13:03:28 -08:00
gpt2.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
gpt_bigcode.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
gpt_j.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
gpt_neox.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
internlm2.py	Revert "Refactor llama family models (#2637 )" (#2851 )	2024-02-13 09:24:59 -08:00
llama.py	Add LoRA support for Gemma (#3050 )	2024-02-28 13:03:28 -08:00
mixtral_quant.py	Add quantized mixtral support (#2673 )	2024-01-30 16:34:10 -08:00
mixtral.py	Align LoRA code between Mistral and Mixtral (fixes #2875 ) (#2880 )	2024-02-15 01:00:43 +01:00
mpt.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
olmo.py	[Minor] Remove unused config files (#3039 )	2024-02-26 17:25:22 -08:00
opt.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
orion.py	Support Orion model (#2539 )	2024-02-26 19:17:06 -08:00
phi.py	Address Phi modeling update 2 (#2428 )	2024-01-12 12:16:49 -08:00
qwen2.py	fix names and license for Qwen2 (#2589 )	2024-01-24 22:37:51 -08:00
qwen.py	[Minor] Remove unused config files (#3039 )	2024-02-26 17:25:22 -08:00
stablelm.py	Fix stablelm (#3038 )	2024-02-26 18:31:10 -08:00
starcoder2.py	Support starcoder2 architecture (#3089 )	2024-02-29 00:51:48 -08:00