vllm/models at 8a7cc254a064b8d42bf4de7a9c3f29552240dfd9 - vllm

History

SangBin Cho 8a7cc254a0 Revert "[Kernel] Use flash-attn for decoding (#3648 )" (#4820 ) Lora 3 & 4 test seems to have illegal memory access failure after this commit; [2024-05-14 23:51:18,182 E 22 22] logging.cc:101: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered <br class="Apple-interchange-newline"> Exmaple: https://buildkite.com/vllm/ci/builds/7382#018f793d-1527-4e1c-ab59-c3a34ec55241 This reverts commit `1356df5`. FILL IN THE PR DESCRIPTION HERE FIX #xxxx (link existing issues this PR will resolve)		2024-05-15 11:52:45 +09:00
..
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
test_aqlm.py	AQLM CUDA support (#3287 )	2024-04-23 13:59:33 -04:00
test_big_models.py	Revert "[Kernel] Use flash-attn for decoding (#3648 )" (#4820 )	2024-05-15 11:52:45 +09:00
test_embedding.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
test_fp8.py	Revert "[Kernel] Use flash-attn for decoding (#3648 )" (#4820 )	2024-05-15 11:52:45 +09:00
test_gptq_marlin.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
test_llava.py	[Test] Make model tests run again and remove --forked from pytest (#3631 )	2024-03-28 21:06:40 -07:00
test_marlin.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
test_mistral.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
test_models.py	[Misc]Add customized information for models (#4132 )	2024-04-30 21:18:14 -07:00
test_oot_registration.py	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
utils.py	[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (#3922 )	2024-04-29 09:35:34 -07:00