vllm/tests
SangBin Cho 2e9a2227ec
[Lora] Support long context lora (#4787)
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
..
async_engine [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00
basic_correctness [Scheduler] Warning upon preemption and Swapping (#4647) 2024-05-13 23:50:44 +09:00
core [Scheduler] Warning upon preemption and Swapping (#4647) 2024-05-13 23:50:44 +09:00
distributed [Core][Distributed] remove graph mode function (#4818) 2024-05-16 10:59:52 -07:00
engine [Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests (#4797) 2024-05-16 20:58:25 -07:00
entrypoints [Frontend] Support OpenAI batch file format (#4794) 2024-05-15 19:13:36 -04:00
fp8_kv Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290) 2024-04-03 14:15:55 -07:00
kernels [Bugfix] fix rope error when load models with different dtypes (#4835) 2024-05-17 18:43:34 +09:00
lora [Lora] Support long context lora (#4787) 2024-05-18 16:05:23 +09:00
metrics [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00
model_executor [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00
models Add GPTQ Marlin 2:4 sparse structured support (#4790) 2024-05-16 12:56:15 -04:00
prefix_caching [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00
prompts [BugFix] Fix input positions for long context with sliding window (#2088) 2023-12-13 12:28:13 -08:00
quantization [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00
samplers [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00
spec_decode [Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840) 2024-05-16 00:53:51 -07:00
tensorizer_loader [Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update tensorizer to version 2.9.0 (#4208) 2024-05-13 14:57:07 -07:00
tokenization [Bugfix] Fix parameter name in get_tokenizer (#4107) 2024-04-25 19:10:48 -07:00
worker [Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) 2024-05-15 14:00:10 +09:00
__init__.py [Small] Formatter only checks lints in changed files (#1528) 2023-10-31 15:39:38 -07:00
conftest.py [CI/Build] Further decouple HuggingFace implementation from ours during tests (#4166) 2024-05-14 23:38:40 -07:00
test_cache_block_hashing.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
test_config.py [Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
test_logger.py [MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273) 2024-05-01 17:34:40 -07:00
test_logits_processor.py [Misc] Remove unnecessary ModelRunner imports (#4703) 2024-05-09 00:17:17 -07:00
test_regression.py [BugFix] Fix GC bug for LLM class (#2882) 2024-02-14 22:17:44 -08:00
test_sampling_params.py [Bugfix] fix crash if max_tokens=None (#2570) 2024-01-23 22:38:55 -08:00
test_sequence.py [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00
test_sharded_state_loader.py [Core] Implement sharded state loader (#4690) 2024-05-15 22:11:54 -07:00
utils.py [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00