Commit Graph

72 Commits

Author SHA1 Message Date
Cyrus Leung
38ef94888a
[CI/Build] Remove "boardwalk" image asset (#6460) 2024-07-16 08:59:36 -07:00
Mor Zusman
9ad32dacd9
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug (#6425)
Co-authored-by: Mor Zusman <morz@ai21.com>
2024-07-16 01:32:55 +00:00
Isotr0py
540c0368b1
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-14 05:27:14 +00:00
Cyrus Leung
024ad87cdc
[Bugfix] Fix dtype mismatch in PaliGemma (#6367) 2024-07-12 08:22:18 -07:00
Robert Shaw
aea19f0989
[ Misc ] Support Models With Bias in compressed-tensors integration (#6356) 2024-07-12 11:11:29 -04:00
xwjiang2010
1df43de9bb
[bug fix] Fix llava next feature size calculation. (#6339)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2024-07-11 17:21:10 +00:00
tomeras91
ddc369fba1
[Bugfix] Mamba cache Cuda Graph padding (#6214) 2024-07-08 11:25:51 -07:00
Roger Wang
6206dcb29e
[Model] Add PaliGemma (#5189)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-07 09:25:50 +08:00
Cyrus Leung
3dd507083f
[CI/Build] Cleanup VLM tests (#6107) 2024-07-03 18:58:18 -07:00
xwjiang2010
d9e98f42e4
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
Cyrus Leung
9831aec49f
[Core] Dynamic image size support for VLMs (#5276)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-07-02 20:34:00 -07:00
Mor Zusman
9d6a8daa87
[Model] Jamba support (#4115)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 23:11:29 +00:00
xwjiang2010
98d6682cd1
[VLM] Remove image_input_type from VLM config (#5852)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
youkaichao
2be6955a3f
[ci][distributed] fix device count call
[ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991)
2024-06-30 08:06:13 +00:00
Cyrus Leung
cff6a1fec1
[CI/Build] Reuse code for checking output consistency (#5988) 2024-06-30 11:44:25 +08:00
Cyrus Leung
99397da534
[CI/Build] Add TP test for vision models (#5892) 2024-06-29 15:45:54 +00:00
Robert Shaw
8dbfcd35bf
[ CI/Build ] Added E2E Test For Compressed Tensors (#5839)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-29 21:12:58 +08:00
Woosuk Kwon
580353da93
[Bugfix] Fix precisions in Gemma 1 (#5913) 2024-06-29 03:10:21 +00:00
Cyrus Leung
6984c02a27
[CI/Build] Refactor image test assets (#5821) 2024-06-26 01:02:34 -07:00
Isotr0py
edd5fe5fa2
[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requirement (#5772) 2024-06-24 12:11:53 +08:00
Chang Su
c35e4a3dd7
[BugFix] Fix test_phi3v.py (#5725) 2024-06-21 04:45:34 +00:00
Roger Wang
4ad7b53e59
[CI/Build][Misc] Update Pytest Marker for VLMs (#5623) 2024-06-18 13:10:04 +00:00
Isotr0py
daef218b55
[Model] Initialize Phi-3-vision support (#4986) 2024-06-17 19:34:33 -07:00
Alexander Matveev
d919ecc771
add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088 (#5145) 2024-06-15 13:38:16 -04:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
Tyler Michael Smith
33e3b37242
[CI/Build] Disable test_fp8.py (#5508) 2024-06-13 13:37:48 -07:00
Michael Goin
23ec72fa03
[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations (#5466) 2024-06-13 15:18:08 +00:00
Simon Mo
e3c12bf6d2
Revert "[CI/Build] Add is_quant_method_supported to control quantization test configurations" (#5463) 2024-06-12 10:03:24 -07:00
Michael Goin
3dd6853bc8
[CI/Build] Add is_quant_method_supported to control quantization test configurations (#5253) 2024-06-12 09:58:02 -07:00
Cyrus Leung
6b29d6fe70
[Model] Initial support for LLaVA-NeXT (#4199)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-10 12:47:15 +00:00
youkaichao
8ea5e44a43
[CI/Test] improve robustness of test (vllm_runner) (#5357)
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)
2024-06-08 08:59:20 +00:00
youkaichao
9fb900f90c
[CI/Test] improve robustness of test (hf_runner) (#5347)
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner) (#5347)
2024-06-07 22:31:32 -07:00
Cyrus Leung
89c920785f
[CI/Build] Update vision tests (#5307) 2024-06-06 05:17:18 -05:00
Cyrus Leung
9ba093b4f4
[CI/Build] Simplify model loading for HfRunner (#5251) 2024-06-04 10:09:19 -07:00
Yuan
cafb8e06c5
[CI/BUILD] enable intel queue for longer CPU tests (#4113) 2024-06-03 10:39:50 -07:00
Cyrus Leung
7a64d24aad
[Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
Robert Shaw
919770957f
[Bugfix] Fix Mistral v0.3 Weight Loading (#5005)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-05-24 12:28:27 +00:00
Cody Yu
a3a73ab069
[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)
The 2nd PR for #4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
2024-05-22 13:28:20 -07:00
Woosuk Kwon
b57e6c5949
[Kernel] Add flash-attn back (#4907) 2024-05-19 18:11:30 -07:00
Cyrus Leung
f68470e803
[Bugfix][Model] Add base class for vision-language models (#4809) 2024-05-19 00:13:33 -07:00
Alexander Matveev
6979ade384
Add GPTQ Marlin 2:4 sparse structured support (#4790)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-05-16 12:56:15 -04:00
Jinzhen Lin
99caa49106
[Kernel] add bfloat16 support for gptq marlin kernel (#4788) 2024-05-16 09:55:29 -04:00
SangBin Cho
8a7cc254a0
Revert "[Kernel] Use flash-attn for decoding (#3648)" (#4820)
Lora 3 & 4 test seems to have illegal memory access failure after this commit;

[2024-05-14 23:51:18,182 E 22 22] logging.cc:101: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered
<br class="Apple-interchange-newline">
Exmaple: https://buildkite.com/vllm/ci/builds/7382#018f793d-1527-4e1c-ab59-c3a34ec55241

This reverts commit 1356df5.

FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (link existing issues this PR will resolve)
2024-05-15 11:52:45 +09:00
Stephen Krider
1356df53bd
[Kernel] Use flash-attn for decoding (#3648)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2024-05-13 15:50:33 -07:00
Cyrus Leung
350f9e107f
[CI/Build] Move test_utils.py to tests/utils.py (#4425)
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)

Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
Robert Shaw
a709e87a4f
[CI/Build] Tweak Marlin Nondeterminism Issues (#4713) 2024-05-12 17:46:31 -07:00
Chang Su
e254497b66
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
SangBin Cho
f6a593093a
[CI] Make mistral tests pass (#4596) 2024-05-08 08:44:35 -07:00
alexm-nm
7038e8b803
[Kernel] Support running GPTQ 8-bit models in Marlin (#4533) 2024-05-02 12:56:22 -04:00
Jee Li
d6f4bd7cdd
[Misc]Add customized information for models (#4132) 2024-04-30 21:18:14 -07:00