vllm/tests at 2e9a2227ecee8990f0552518fc40dba67f1026b3 - vllm

History

SangBin Cho 2e9a2227ec [Lora] Support long context lora (#4787 ) Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files		2024-05-18 16:05:23 +09:00
..
async_engine	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
basic_correctness	[Scheduler] Warning upon preemption and Swapping (#4647 )	2024-05-13 23:50:44 +09:00
core	[Scheduler] Warning upon preemption and Swapping (#4647 )	2024-05-13 23:50:44 +09:00
distributed	[Core][Distributed] remove graph mode function (#4818 )	2024-05-16 10:59:52 -07:00
engine	[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests (#4797 )	2024-05-16 20:58:25 -07:00
entrypoints	[Frontend] Support OpenAI batch file format (#4794 )	2024-05-15 19:13:36 -04:00
fp8_kv	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00
kernels	[Bugfix] fix rope error when load models with different dtypes (#4835 )	2024-05-17 18:43:34 +09:00
lora	[Lora] Support long context lora (#4787 )	2024-05-18 16:05:23 +09:00
metrics	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
model_executor	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
models	Add GPTQ Marlin 2:4 sparse structured support (#4790 )	2024-05-16 12:56:15 -04:00
prefix_caching	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
prompts	[BugFix] Fix input positions for long context with sliding window (#2088 )	2023-12-13 12:28:13 -08:00
quantization	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
samplers	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
spec_decode	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 )	2024-05-16 00:53:51 -07:00
tensorizer_loader	[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208 )	2024-05-13 14:57:07 -07:00
tokenization	[Bugfix] Fix parameter name in `get_tokenizer` (#4107 )	2024-04-25 19:10:48 -07:00
worker	[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681 )	2024-05-15 14:00:10 +09:00
__init__.py	[Small] Formatter only checks lints in changed files (#1528 )	2023-10-31 15:39:38 -07:00
conftest.py	[CI/Build] Further decouple HuggingFace implementation from ours during tests (#4166 )	2024-05-14 23:38:40 -07:00
test_cache_block_hashing.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
test_config.py	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
test_logger.py	[MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273 )	2024-05-01 17:34:40 -07:00
test_logits_processor.py	[Misc] Remove unnecessary ModelRunner imports (#4703 )	2024-05-09 00:17:17 -07:00
test_regression.py	[BugFix] Fix GC bug for `LLM` class (#2882 )	2024-02-14 22:17:44 -08:00
test_sampling_params.py	[Bugfix] fix crash if max_tokens=None (#2570 )	2024-01-23 22:38:55 -08:00
test_sequence.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
test_sharded_state_loader.py	[Core] Implement sharded state loader (#4690 )	2024-05-15 22:11:54 -07:00
utils.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00