vllm/models at ab3a5a8259922ce312d01be39d29e27666968039 - vllm

Isotr0py ab3a5a8259 Support OLMo models. (#2832 )	2024-02-18 21:05:15 -08:00
..
__init__.py	Support OLMo models. (#2832 )	2024-02-18 21:05:15 -08:00
baichuan.py	Revert "Refactor llama family models (#2637 )" (#2851 )	2024-02-13 09:24:59 -08:00
bloom.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
chatglm.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
decilm.py	Fix DeciLM (#2883 )	2024-02-14 22:29:57 -08:00
deepseek.py	Add fused top-K softmax kernel for MoE (#2769 )	2024-02-05 17:38:02 -08:00
falcon.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
gpt2.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
gpt_bigcode.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
gpt_j.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
gpt_neox.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
internlm2.py	Revert "Refactor llama family models (#2637 )" (#2851 )	2024-02-13 09:24:59 -08:00
llama.py	Fix internlm after https://github.com/vllm-project/vllm/pull/2860 (#2861 )	2024-02-13 18:01:15 -08:00
mistral.py	Add LoRA support for Mixtral (#2831 )	2024-02-14 00:55:45 +01:00
mixtral_quant.py	Add quantized mixtral support (#2673 )	2024-01-30 16:34:10 -08:00
mixtral.py	Align LoRA code between Mistral and Mixtral (fixes #2875 ) (#2880 )	2024-02-15 01:00:43 +01:00
mpt.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
olmo.py	Support OLMo models. (#2832 )	2024-02-18 21:05:15 -08:00
opt.py	Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )	2024-01-03 11:30:22 -08:00
phi.py	Address Phi modeling update 2 (#2428 )	2024-01-12 12:16:49 -08:00
qwen2.py	fix names and license for Qwen2 (#2589 )	2024-01-24 22:37:51 -08:00
qwen.py	Revert "Refactor llama family models (#2637 )" (#2851 )	2024-02-13 09:24:59 -08:00
stablelm.py	Revert "Refactor llama family models (#2637 )" (#2851 )	2024-02-13 09:24:59 -08:00