Commit Graph

143 Commits

Author SHA1 Message Date
Mor Zusman
f13a07b1f8
[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533) 2024-09-29 17:35:58 -04:00
Cyrus Leung
26a68d5d7e
[CI/Build] Add test decorator for minimum GPU memory (#8925) 2024-09-29 02:50:51 +00:00
Cyrus Leung
e1a3f5e831
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-28 09:54:35 -07:00
Isotr0py
6d792d2f31
[Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 (#8892) 2024-09-27 01:15:58 -07:00
Nick Hill
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model (#8811)
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Alex Brooks
8ff7ced996
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-24 07:36:46 +00:00
Alex Brooks
9b8c8ba119
[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-09-23 07:44:48 +00:00
litianjian
5b59532760
[Model][VLM] Add LLaVA-Onevision model support (#8486)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-22 10:51:44 -07:00
Patrick von Platen
b4e4eda92e
[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640) 2024-09-20 14:33:03 -07:00
afeldman-nm
a8c1d161a7
[Core] *Prompt* logprobs support in Multi-step (#8199) 2024-09-18 08:38:43 -07:00
Cyrus Leung
6ffa3f314c
[CI/Build] Avoid CUDA initialization (#8534) 2024-09-18 10:38:11 +00:00
Patrick von Platen
a54ed80249
[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-17 17:50:37 +00:00
ywfang
8a0cf1ddc3
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
Cyrus Leung
8427550488
[CI/Build] Update pixtral tests to use JSON (#8436) 2024-09-13 03:47:52 +00:00
Patrick von Platen
d31174a4e1
[Hotfix][Pixtral] Fix multiple images bugs (#8415) 2024-09-12 15:21:51 -07:00
Alex Brooks
c6202daeed
[Model] Support multiple images for qwen-vl (#8247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
Isotr0py
e56bf27741
[Bugfix] Fix InternVL2 inference with various num_patches (#8375)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:35 -07:00
Patrick von Platen
d394787e52
Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
bnellnm
73202dbe77
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
Yang Fan
3b7fea770f
[Model][VLM] Add Qwen2-VL model support (#7905)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
Yangshen⚡Deng
6a512a00df
[model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00
Pavani Majety
efcf946a15
[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112) 2024-09-11 00:38:40 -04:00
Isotr0py
e807125936
[Model][VLM] Support multi-images inputs for InternVL2 models (#8201) 2024-09-07 16:38:23 +08:00
Cyrus Leung
2f707fcb35
[Model] Multi-input support for LLaVA (#8238) 2024-09-07 02:57:24 +00:00
Patrick von Platen
29f49cd6e3
[Model] Allow loading from original Mistral format (#8168)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-06 17:02:05 -06:00
Alex Brooks
9da25a88aa
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-05 12:48:10 +00:00
Cody Yu
2ad2e5608e
[MISC] Consolidate FP8 kv-cache tests (#8131) 2024-09-04 18:53:25 +00:00
Peter Salas
2be8ec6e71
[Model] Add Ultravox support for multiple audio chunks (#7963) 2024-09-04 04:38:21 +00:00
Shawn Tan
f8d60145b4
[Model] Add Granite model (#7436)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-09-01 18:37:18 -07:00
Roger Wang
5b86b19954
[Misc] Optional installation of audio related packages (#8063) 2024-09-01 14:46:57 -07:00
Pavani Majety
622f8abff8
[Bugfix] bugfix and add model test for flashinfer fp8 kv cache. (#8013) 2024-08-30 22:18:50 -07:00
Wenxiang
1248e8506a
[Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
2024-08-30 13:42:57 -06:00
Jungho Christopher Cho
f97be32d1d
[VLM][Model] TP support for ViTs (#7186)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-08-30 08:19:27 -07:00
afeldman-nm
428dd1445e
[Core] Logprobs support in Multi-step (#7652) 2024-08-29 19:19:08 -07:00
Cyrus Leung
4abed65c58
[VLM] Disallow overflowing max_model_len for multimodal models (#7998) 2024-08-29 17:49:04 -07:00
zifeitong
5340a2dccf
[Model] Add multi-image input support for LLaVA-Next offline inference (#7230) 2024-08-28 07:09:02 +08:00
Isotr0py
9db642138b
[CI/Build][VLM] Cleanup multiple images inputs model test (#7897) 2024-08-27 15:28:30 +00:00
Patrick von Platen
6fc4e6e07a
[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739) 2024-08-27 12:40:02 +00:00
Isotr0py
2059b8d9ca
[Misc] Remove snapshot_download usage in InternVL2 test (#7835) 2024-08-25 15:53:09 +00:00
Isotr0py
8aaf3d5347
[Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783) 2024-08-25 11:51:20 +00:00
zifeitong
80162c44b1
[Bugfix] Fix Phi-3v crash when input images are of certain sizes (#7840) 2024-08-24 18:16:24 -07:00
zifeitong
df1a21131d
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710) 2024-08-22 09:36:24 +08:00
Peter Salas
1ca0d4f86b
[Model] Add UltravoxModel and UltravoxConfig (#7615) 2024-08-21 22:49:39 +00:00
Isotr0py
12e1c65bc9
[Model] Add AWQ quantization support for InternVL2 model (#7187) 2024-08-20 23:18:57 -07:00
Isotr0py
aae6927be0
[VLM][Model] Add test for InternViT vision encoder (#7409) 2024-08-20 23:10:20 +08:00
Isotr0py
7601cb044d
[Core] Support tensor parallelism for GGUF quantization (#7520) 2024-08-19 17:30:14 -04:00
Alex Brooks
40e1360bb6
[CI/Build] Add text-only test for Qwen models (#7475)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-08-19 07:43:46 +08:00
Mor Zusman
7fc23be81c
[Kernel] W8A16 Int8 inside FusedMoE (#7415) 2024-08-16 10:06:51 -07:00