Mor Zusman
7fc23be81c
[Kernel] W8A16 Int8 inside FusedMoE ( #7415 )
2024-08-16 10:06:51 -07:00
Cyrus Leung
3f674a49b5
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt ( #7126 )
2024-08-14 17:55:42 +00:00
youkaichao
ea49e6a3c8
[misc][ci] fix cpu test with plugins ( #7489 )
2024-08-13 19:27:46 -07:00
youkaichao
16422ea76f
[misc][plugin] add plugin system implementation ( #7426 )
2024-08-13 16:24:17 -07:00
Cyrus Leung
7025b11d94
[Bugfix] Fix weight loading for Chameleon when TP>1 ( #7410 )
2024-08-13 05:33:41 +00:00
Roger Wang
e6e42e4b17
[Core][VLM] Support image embeddings as input ( #6613 )
2024-08-12 16:16:06 +08:00
Isotr0py
4c5d8e8ea9
[Bugfix] Fix phi3v batch inference when images have different aspect ratio ( #7392 )
2024-08-10 16:19:33 +00:00
Cyrus Leung
7eb4a51c5f
[Core] Support serving encoder/decoder models ( #7258 )
2024-08-09 10:39:41 +08:00
Isotr0py
b764547616
[Bugfix] Fix input processor for InternVL2 model ( #7164 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-07 09:32:07 -07:00
afeldman-nm
fd95e026e0
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) ( #4942 )
...
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-06 16:51:47 -04:00
Cyrus Leung
1f26efbb3a
[Model] Support SigLIP encoder and alternative decoders for LLaVA models ( #7153 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-08-06 16:55:31 +08:00
Isotr0py
360bd67cf0
[Core] Support loading GGUF model ( #5191 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-05 17:54:23 -06:00
Alphi
7b86e7c9cd
[Model] Add multi-image support for minicpmv ( #7122 )
...
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-05 09:23:17 +08:00
Michael Goin
fb3db61688
[CI/Build] Remove sparseml requirement from testing ( #7037 )
2024-08-01 12:00:51 -07:00
Cyrus Leung
daed30c4a9
[Bugfix] Fix feature size calculation for LLaVA-NeXT ( #6982 )
2024-07-31 23:46:17 +08:00
Cyrus Leung
f230cc2ca6
[Bugfix] Fix broadcasting logic for multi_modal_kwargs ( #6836 )
2024-07-31 10:38:45 +08:00
Isotr0py
7cbd9ec7a9
[Model] Initialize support for InternVL2 series models ( #6514 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Cyrus Leung
1ad86acf17
[Model] Initial support for BLIP-2 ( #5920 )
...
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
Joe
14dbd5a767
[Model] H2O Danube3-4b ( #6451 )
2024-07-26 20:47:50 -07:00
Alphi
9e169a4c61
[Model] Adding support for MiniCPM-V ( #4087 )
2024-07-24 20:59:30 -07:00
Roger Wang
22fa2e35cb
[VLM][Model] Support image input for Chameleon ( #6633 )
2024-07-22 23:50:48 -07:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes ( #6543 )
2024-07-20 09:39:07 -07:00
Cyrus Leung
38ef94888a
[CI/Build] Remove "boardwalk" image asset ( #6460 )
2024-07-16 08:59:36 -07:00
Mor Zusman
9ad32dacd9
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug ( #6425 )
...
Co-authored-by: Mor Zusman <morz@ai21.com>
2024-07-16 01:32:55 +00:00
Isotr0py
540c0368b1
[Model] Initialize Fuyu-8B support ( #3924 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-14 05:27:14 +00:00
Cyrus Leung
024ad87cdc
[Bugfix] Fix dtype mismatch in PaliGemma ( #6367 )
2024-07-12 08:22:18 -07:00
Robert Shaw
aea19f0989
[ Misc ] Support Models With Bias in compressed-tensors integration ( #6356 )
2024-07-12 11:11:29 -04:00
xwjiang2010
1df43de9bb
[bug fix] Fix llava next feature size calculation. ( #6339 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2024-07-11 17:21:10 +00:00
tomeras91
ddc369fba1
[Bugfix] Mamba cache Cuda Graph padding ( #6214 )
2024-07-08 11:25:51 -07:00
Roger Wang
6206dcb29e
[Model] Add PaliGemma ( #5189 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-07 09:25:50 +08:00
Cyrus Leung
3dd507083f
[CI/Build] Cleanup VLM tests ( #6107 )
2024-07-03 18:58:18 -07:00
xwjiang2010
d9e98f42e4
[vlm] Remove vision language config. ( #6089 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
Cyrus Leung
9831aec49f
[Core] Dynamic image size support for VLMs ( #5276 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-07-02 20:34:00 -07:00
Mor Zusman
9d6a8daa87
[Model] Jamba support ( #4115 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 23:11:29 +00:00
xwjiang2010
98d6682cd1
[VLM] Remove image_input_type from VLM config ( #5852 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
youkaichao
2be6955a3f
[ci][distributed] fix device count call
...
[ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991 )
2024-06-30 08:06:13 +00:00
Cyrus Leung
cff6a1fec1
[CI/Build] Reuse code for checking output consistency ( #5988 )
2024-06-30 11:44:25 +08:00
Cyrus Leung
99397da534
[CI/Build] Add TP test for vision models ( #5892 )
2024-06-29 15:45:54 +00:00
Robert Shaw
8dbfcd35bf
[ CI/Build ] Added E2E Test For Compressed Tensors ( #5839 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-29 21:12:58 +08:00
Woosuk Kwon
580353da93
[Bugfix] Fix precisions in Gemma 1 ( #5913 )
2024-06-29 03:10:21 +00:00
Cyrus Leung
6984c02a27
[CI/Build] Refactor image test assets ( #5821 )
2024-06-26 01:02:34 -07:00
Isotr0py
edd5fe5fa2
[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requirement ( #5772 )
2024-06-24 12:11:53 +08:00
Chang Su
c35e4a3dd7
[BugFix] Fix test_phi3v.py ( #5725 )
2024-06-21 04:45:34 +00:00
Roger Wang
4ad7b53e59
[CI/Build][Misc] Update Pytest Marker for VLMs ( #5623 )
2024-06-18 13:10:04 +00:00
Isotr0py
daef218b55
[Model] Initialize Phi-3-vision support ( #4986 )
2024-06-17 19:34:33 -07:00
Alexander Matveev
d919ecc771
add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088 ( #5145 )
2024-06-15 13:38:16 -04:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
Tyler Michael Smith
33e3b37242
[CI/Build] Disable test_fp8.py ( #5508 )
2024-06-13 13:37:48 -07:00
Michael Goin
23ec72fa03
[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations ( #5466 )
2024-06-13 15:18:08 +00:00
Simon Mo
e3c12bf6d2
Revert "[CI/Build] Add is_quant_method_supported to control quantization test configurations" ( #5463 )
2024-06-12 10:03:24 -07:00