vllm/vllm/model_executor/models
2024-08-12 16:16:06 +08:00
..
__init__.py [Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) 2024-08-06 16:51:47 -04:00
arctic.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
baichuan.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
bart.py [Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) 2024-08-06 16:51:47 -04:00
blip2.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
blip.py [Model] Initial support for BLIP-2 (#5920) 2024-07-27 11:53:07 +00:00
bloom.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
chameleon.py [Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758) 2024-07-31 19:49:11 -07:00
chatglm.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
clip.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
commandr.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
dbrx.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
decilm.py [Model] Add base class for LoRA-supported models (#5018) 2024-06-27 16:03:04 +08:00
deepseek_v2.py [Model] Pipeline Parallel Support for DeepSeek v2 (#6519) 2024-07-23 12:22:09 -07:00
deepseek.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
falcon.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
fuyu.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
gemma2.py [Misc] Support attention logits soft-capping with flash-attn (#7022) 2024-08-01 13:14:37 -07:00
gemma.py [Bugfix] Lower gemma's unloaded_params exception to warning (#7002) 2024-08-01 12:01:07 -07:00
gpt2.py [ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515) 2024-07-18 22:39:18 -04:00
gpt_bigcode.py [Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules. (#6326) 2024-07-11 07:05:59 -07:00
gpt_j.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
gpt_neox.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
idefics2_vision_model.py [Model]Refactor MiniCPMV (#7020) 2024-08-04 08:12:41 +00:00
interfaces.py [Core] Support serving encoder/decoder models (#7258) 2024-08-09 10:39:41 +08:00
intern_vit.py [Model] Refactor and decouple weight loading logic for InternVL2 model (#7067) 2024-08-02 22:36:14 -07:00
internlm2.py [Model] Initialize support for InternVL2 series models (#6514) 2024-07-29 10:16:30 +00:00
internvl.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
jais.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
jamba.py [Model][Jamba] Mamba cache single buffer (#6739) 2024-08-09 10:07:06 -04:00
llama_embedding.py [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
llama.py [Core] Support loading GGUF model (#5191) 2024-08-05 17:54:23 -06:00
llava_next.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
llava.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
medusa.py [Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978) 2024-07-09 18:34:02 -07:00
minicpm.py [Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758) 2024-07-31 19:49:11 -07:00
minicpmv.py [Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273) 2024-08-08 14:02:41 +00:00
mixtral_quant.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
mixtral.py [ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515) 2024-07-18 22:39:18 -04:00
mlp_speculator.py [Bugfix] Fix speculative decoding with MLPSpeculator with padded vocabulary (#7218) 2024-08-08 22:08:46 -07:00
mpt.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
na_vit.py [Model]Refactor MiniCPMV (#7020) 2024-08-04 08:12:41 +00:00
nemotron.py [Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611) 2024-07-26 14:33:42 -04:00
olmo.py [Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758) 2024-07-31 19:49:11 -07:00
opt.py [Model] Initial support for BLIP-2 (#5920) 2024-07-27 11:53:07 +00:00
orion.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
paligemma.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
persimmon.py [Model] Initialize Fuyu-8B support (#3924) 2024-07-14 05:27:14 +00:00
phi3_small.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
phi3v.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
phi.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
qwen2_moe.py [Model] Pipeline parallel support for Qwen2 (#6924) 2024-07-31 18:49:51 -07:00
qwen2.py [Core] Support loading GGUF model (#5191) 2024-08-05 17:54:23 -06:00
qwen.py [Models] Support Qwen model with PP (#6974) 2024-08-01 12:40:43 -07:00
siglip.py [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
stablelm.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
starcoder2.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00
utils.py [Model] Support SigLIP encoder and alternative decoders for LLaVA models (#7153) 2024-08-06 16:55:31 +08:00
xverse.py [CORE] Quantized lm-head Framework (#4442) 2024-07-02 22:25:17 +00:00