vllm/models at e6e42e4b1759b8703bf6cb80aefa9c0ec35044f6 - vllm

Roger Wang e6e42e4b17 [Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
..
__init__.py	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )	2024-08-06 16:51:47 -04:00
arctic.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
baichuan.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
bart.py	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )	2024-08-06 16:51:47 -04:00
blip2.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
blip.py	[Model] Initial support for BLIP-2 (#5920 )	2024-07-27 11:53:07 +00:00
bloom.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
chameleon.py	[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758 )	2024-07-31 19:49:11 -07:00
chatglm.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
clip.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
commandr.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
dbrx.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
decilm.py	[Model] Add base class for LoRA-supported models (#5018 )	2024-06-27 16:03:04 +08:00
deepseek_v2.py	[Model] Pipeline Parallel Support for DeepSeek v2 (#6519 )	2024-07-23 12:22:09 -07:00
deepseek.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
falcon.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
fuyu.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
gemma2.py	[Misc] Support attention logits soft-capping with flash-attn (#7022 )	2024-08-01 13:14:37 -07:00
gemma.py	[Bugfix] Lower gemma's unloaded_params exception to warning (#7002 )	2024-08-01 12:01:07 -07:00
gpt2.py	[ Misc ] non-uniform quantization via `compressed-tensors` for `Llama` (#6515 )	2024-07-18 22:39:18 -04:00
gpt_bigcode.py	[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules. (#6326 )	2024-07-11 07:05:59 -07:00
gpt_j.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
gpt_neox.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
idefics2_vision_model.py	[Model]Refactor MiniCPMV (#7020 )	2024-08-04 08:12:41 +00:00
interfaces.py	[Core] Support serving encoder/decoder models (#7258 )	2024-08-09 10:39:41 +08:00
intern_vit.py	[Model] Refactor and decouple weight loading logic for InternVL2 model (#7067 )	2024-08-02 22:36:14 -07:00
internlm2.py	[Model] Initialize support for InternVL2 series models (#6514 )	2024-07-29 10:16:30 +00:00
internvl.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
jais.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
jamba.py	[Model][Jamba] Mamba cache single buffer (#6739 )	2024-08-09 10:07:06 -04:00
llama_embedding.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
llama.py	[Core] Support loading GGUF model (#5191 )	2024-08-05 17:54:23 -06:00
llava_next.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
llava.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
medusa.py	[Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978 )	2024-07-09 18:34:02 -07:00
minicpm.py	[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758 )	2024-07-31 19:49:11 -07:00
minicpmv.py	[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273 )	2024-08-08 14:02:41 +00:00
mixtral_quant.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
mixtral.py	[ Misc ] non-uniform quantization via `compressed-tensors` for `Llama` (#6515 )	2024-07-18 22:39:18 -04:00
mlp_speculator.py	[Bugfix] Fix speculative decoding with MLPSpeculator with padded vocabulary (#7218 )	2024-08-08 22:08:46 -07:00
mpt.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
na_vit.py	[Model]Refactor MiniCPMV (#7020 )	2024-08-04 08:12:41 +00:00
nemotron.py	[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611 )	2024-07-26 14:33:42 -04:00
olmo.py	[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758 )	2024-07-31 19:49:11 -07:00
opt.py	[Model] Initial support for BLIP-2 (#5920 )	2024-07-27 11:53:07 +00:00
orion.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
paligemma.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
persimmon.py	[Model] Initialize Fuyu-8B support (#3924 )	2024-07-14 05:27:14 +00:00
phi3_small.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
phi3v.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
phi.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
qwen2_moe.py	[Model] Pipeline parallel support for Qwen2 (#6924 )	2024-07-31 18:49:51 -07:00
qwen2.py	[Core] Support loading GGUF model (#5191 )	2024-08-05 17:54:23 -06:00
qwen.py	[Models] Support Qwen model with PP (#6974 )	2024-08-01 12:40:43 -07:00
siglip.py	[Core][VLM] Support image embeddings as input (#6613 )	2024-08-12 16:16:06 +08:00
stablelm.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
starcoder2.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
utils.py	[Model] Support SigLIP encoder and alternative decoders for LLaVA models (#7153 )	2024-08-06 16:55:31 +08:00
xverse.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00