Isotr0py
daef218b55
[Model] Initialize Phi-3-vision support ( #4986 )
2024-06-17 19:34:33 -07:00
Alexander Matveev
d919ecc771
add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088 ( #5145 )
2024-06-15 13:38:16 -04:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
Tyler Michael Smith
33e3b37242
[CI/Build] Disable test_fp8.py ( #5508 )
2024-06-13 13:37:48 -07:00
Michael Goin
23ec72fa03
[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations ( #5466 )
2024-06-13 15:18:08 +00:00
Simon Mo
e3c12bf6d2
Revert "[CI/Build] Add is_quant_method_supported to control quantization test configurations" ( #5463 )
2024-06-12 10:03:24 -07:00
Michael Goin
3dd6853bc8
[CI/Build] Add is_quant_method_supported to control quantization test configurations ( #5253 )
2024-06-12 09:58:02 -07:00
Cyrus Leung
6b29d6fe70
[Model] Initial support for LLaVA-NeXT ( #4199 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-10 12:47:15 +00:00
youkaichao
8ea5e44a43
[CI/Test] improve robustness of test (vllm_runner) ( #5357 )
...
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357 )
2024-06-08 08:59:20 +00:00
youkaichao
9fb900f90c
[CI/Test] improve robustness of test (hf_runner) ( #5347 )
...
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner) (#5347 )
2024-06-07 22:31:32 -07:00
Cyrus Leung
89c920785f
[CI/Build] Update vision tests ( #5307 )
2024-06-06 05:17:18 -05:00
Cyrus Leung
9ba093b4f4
[CI/Build] Simplify model loading for HfRunner ( #5251 )
2024-06-04 10:09:19 -07:00
Yuan
cafb8e06c5
[CI/BUILD] enable intel queue for longer CPU tests ( #4113 )
2024-06-03 10:39:50 -07:00
Cyrus Leung
7a64d24aad
[Core] Support image processor ( #4197 )
2024-06-02 22:56:41 -07:00
Robert Shaw
919770957f
[Bugfix] Fix Mistral v0.3 Weight Loading ( #5005 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-05-24 12:28:27 +00:00
Cody Yu
a3a73ab069
[Misc] Load FP8 kv-cache scaling factors from checkpoints ( #4893 )
...
The 2nd PR for #4532 .
This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
2024-05-22 13:28:20 -07:00
Woosuk Kwon
b57e6c5949
[Kernel] Add flash-attn back ( #4907 )
2024-05-19 18:11:30 -07:00
Cyrus Leung
f68470e803
[Bugfix][Model] Add base class for vision-language models ( #4809 )
2024-05-19 00:13:33 -07:00
Alexander Matveev
6979ade384
Add GPTQ Marlin 2:4 sparse structured support ( #4790 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-05-16 12:56:15 -04:00
Jinzhen Lin
99caa49106
[Kernel] add bfloat16 support for gptq marlin kernel ( #4788 )
2024-05-16 09:55:29 -04:00
SangBin Cho
8a7cc254a0
Revert "[Kernel] Use flash-attn for decoding ( #3648 )" ( #4820 )
...
Lora 3 & 4 test seems to have illegal memory access failure after this commit;
[2024-05-14 23:51:18,182 E 22 22] logging.cc:101: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered
<br class="Apple-interchange-newline">
Exmaple: https://buildkite.com/vllm/ci/builds/7382#018f793d-1527-4e1c-ab59-c3a34ec55241
This reverts commit 1356df5 .
FILL IN THE PR DESCRIPTION HERE
FIX #xxxx (link existing issues this PR will resolve)
2024-05-15 11:52:45 +09:00
Stephen Krider
1356df53bd
[Kernel] Use flash-attn for decoding ( #3648 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2024-05-13 15:50:33 -07:00
Cyrus Leung
350f9e107f
[CI/Build] Move test_utils.py to tests/utils.py ( #4425 )
...
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)
Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
Robert Shaw
a709e87a4f
[CI/Build] Tweak Marlin Nondeterminism Issues ( #4713 )
2024-05-12 17:46:31 -07:00
Chang Su
e254497b66
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API ( #3734 )
2024-05-11 11:30:37 -07:00
SangBin Cho
f6a593093a
[CI] Make mistral tests pass ( #4596 )
2024-05-08 08:44:35 -07:00
alexm-nm
7038e8b803
[Kernel] Support running GPTQ 8-bit models in Marlin ( #4533 )
2024-05-02 12:56:22 -04:00
Jee Li
d6f4bd7cdd
[Misc]Add customized information for models ( #4132 )
2024-04-30 21:18:14 -07:00
Robert Shaw
111815d482
[Kernel] Support Fp8 Checkpoints (Dynamic + Static) ( #4332 )
...
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-04-30 21:46:12 +00:00
Robert Shaw
73c8d677e5
[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin ( #3922 )
...
Co-authored-by: alexm <alexm@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-04-29 09:35:34 -07:00
James Fleming
2b7949c1c2
AQLM CUDA support ( #3287 )
...
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-04-23 13:59:33 -04:00
Michael Goin
53b018edcb
[Bugfix] Get available quantization methods from quantization registry ( #4098 )
2024-04-18 00:21:55 -07:00
SangBin Cho
67b4221a61
[Core][5/N] Fully working chunked prefill e2e ( #3884 )
2024-04-10 17:56:48 -07:00
youkaichao
95baec828f
[Core] enable out-of-tree model register ( #3871 )
2024-04-06 17:11:41 -07:00
Robert Shaw
563c1d7ec5
[CI/Build] Make Marlin Tests Green ( #3753 )
2024-03-30 19:18:34 -07:00
SangBin Cho
26422e477b
[Test] Make model tests run again and remove --forked from pytest ( #3631 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-03-28 21:06:40 -07:00
xwjiang2010
64172a976c
[Feature] Add vision language model support. ( #3042 )
2024-03-25 14:16:30 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
Roy
ea5f14e6ff
[Bugfix][Model] Fix Qwen2 ( #3554 )
2024-03-22 00:18:58 +00:00
Zhuohan Li
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
Robert Shaw
c0c2335ce0
Integrate Marlin Kernels for Int4 GPTQ inference ( #2497 )
...
Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com>
Co-authored-by: alexm <alexm@neuralmagic.com>
2024-03-01 12:47:51 -08:00
Seonghyeon
bfdcfa6a05
Support starcoder2 architecture ( #3089 )
2024-02-29 00:51:48 -08:00
Isotr0py
ab3a5a8259
Support OLMo models. ( #2832 )
2024-02-18 21:05:15 -08:00
Hyunsung Lee
e1957c6ebd
Add StableLM3B model ( #2372 )
2024-01-16 20:32:40 -08:00
avideci
de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct ( #2062 )
2023-12-19 02:29:33 -08:00
Woosuk Kwon
f8c688d746
[Minor] Add Phi 2 to supported models ( #2159 )
2023-12-17 02:54:57 -08:00
Woosuk Kwon
f1c8520146
[BugFix] Fix input positions for long context with sliding window ( #2088 )
2023-12-13 12:28:13 -08:00
maximzubkov
521b35f799
Support Microsoft Phi 1.5 ( #1664 )
2023-11-16 14:28:39 -08:00
Woosuk Kwon
9524867701
Add Mistral 7B to test_models ( #1366 )
2023-10-16 17:49:54 -07:00
Woosuk Kwon
32b6816e55
Add tests for models ( #922 )
2023-09-01 11:19:43 +09:00