| .. |
|
__init__.py
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
|
2024-08-06 16:51:47 -04:00 |
|
arctic.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
baichuan.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
bart.py
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
|
2024-08-06 16:51:47 -04:00 |
|
blip2.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
blip.py
|
[Model] Initial support for BLIP-2 (#5920)
|
2024-07-27 11:53:07 +00:00 |
|
bloom.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
chameleon.py
|
[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758)
|
2024-07-31 19:49:11 -07:00 |
|
chatglm.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
clip.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
commandr.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
dbrx.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
decilm.py
|
[Model] Add base class for LoRA-supported models (#5018)
|
2024-06-27 16:03:04 +08:00 |
|
deepseek_v2.py
|
[Model] Pipeline Parallel Support for DeepSeek v2 (#6519)
|
2024-07-23 12:22:09 -07:00 |
|
deepseek.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
falcon.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
fuyu.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
gemma2.py
|
[Misc] Support attention logits soft-capping with flash-attn (#7022)
|
2024-08-01 13:14:37 -07:00 |
|
gemma.py
|
[Bugfix] Lower gemma's unloaded_params exception to warning (#7002)
|
2024-08-01 12:01:07 -07:00 |
|
gpt2.py
|
[ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515)
|
2024-07-18 22:39:18 -04:00 |
|
gpt_bigcode.py
|
[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules. (#6326)
|
2024-07-11 07:05:59 -07:00 |
|
gpt_j.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
gpt_neox.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
idefics2_vision_model.py
|
[Model]Refactor MiniCPMV (#7020)
|
2024-08-04 08:12:41 +00:00 |
|
interfaces.py
|
[Core] Support serving encoder/decoder models (#7258)
|
2024-08-09 10:39:41 +08:00 |
|
intern_vit.py
|
[Model] Refactor and decouple weight loading logic for InternVL2 model (#7067)
|
2024-08-02 22:36:14 -07:00 |
|
internlm2.py
|
[Model] Initialize support for InternVL2 series models (#6514)
|
2024-07-29 10:16:30 +00:00 |
|
internvl.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
jais.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
jamba.py
|
[Model][Jamba] Mamba cache single buffer (#6739)
|
2024-08-09 10:07:06 -04:00 |
|
llama_embedding.py
|
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734)
|
2024-05-11 11:30:37 -07:00 |
|
llama.py
|
[Core] Support loading GGUF model (#5191)
|
2024-08-05 17:54:23 -06:00 |
|
llava_next.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
llava.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
medusa.py
|
[Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978)
|
2024-07-09 18:34:02 -07:00 |
|
minicpm.py
|
[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758)
|
2024-07-31 19:49:11 -07:00 |
|
minicpmv.py
|
[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273)
|
2024-08-08 14:02:41 +00:00 |
|
mixtral_quant.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
mixtral.py
|
[ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515)
|
2024-07-18 22:39:18 -04:00 |
|
mlp_speculator.py
|
[Bugfix] Fix speculative decoding with MLPSpeculator with padded vocabulary (#7218)
|
2024-08-08 22:08:46 -07:00 |
|
mpt.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
na_vit.py
|
[Model]Refactor MiniCPMV (#7020)
|
2024-08-04 08:12:41 +00:00 |
|
nemotron.py
|
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611)
|
2024-07-26 14:33:42 -04:00 |
|
olmo.py
|
[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758)
|
2024-07-31 19:49:11 -07:00 |
|
opt.py
|
[Model] Initial support for BLIP-2 (#5920)
|
2024-07-27 11:53:07 +00:00 |
|
orion.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
paligemma.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
persimmon.py
|
[Model] Initialize Fuyu-8B support (#3924)
|
2024-07-14 05:27:14 +00:00 |
|
phi3_small.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
phi3v.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
phi.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
qwen2_moe.py
|
[Model] Pipeline parallel support for Qwen2 (#6924)
|
2024-07-31 18:49:51 -07:00 |
|
qwen2.py
|
[Core] Support loading GGUF model (#5191)
|
2024-08-05 17:54:23 -06:00 |
|
qwen.py
|
[Models] Support Qwen model with PP (#6974)
|
2024-08-01 12:40:43 -07:00 |
|
siglip.py
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
stablelm.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
starcoder2.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |
|
utils.py
|
[Model] Support SigLIP encoder and alternative decoders for LLaVA models (#7153)
|
2024-08-06 16:55:31 +08:00 |
|
xverse.py
|
[CORE] Quantized lm-head Framework (#4442)
|
2024-07-02 22:25:17 +00:00 |