vllm/tests/lora
SangBin Cho 2e9a2227ec
[Lora] Support long context lora (#4787)
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
..
data [Lora] Support long context lora (#4787) 2024-05-18 16:05:23 +09:00
__init__.py [Experimental] Add multi-LoRA support (#1804) 2024-01-23 15:26:37 -08:00
conftest.py [Lora] Support long context lora (#4787) 2024-05-18 16:05:23 +09:00
test_baichuan.py [Kernel] Add punica dimension for Baichuan-13B (#4053) 2024-04-13 07:55:05 -07:00
test_chatglm3.py Enable more models to inference based on LoRA (#3382) 2024-03-25 18:09:31 -07:00
test_gemma.py Add LoRA support for Gemma (#3050) 2024-02-28 13:03:28 -08:00
test_layer_variation.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
test_layers.py [Lora] Support long context lora (#4787) 2024-05-18 16:05:23 +09:00
test_llama.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
test_long_context.py [Lora] Support long context lora (#4787) 2024-05-18 16:05:23 +09:00
test_lora_checkpoints.py [Bugfix] Fix LoRA loading check (#4138) 2024-04-19 00:59:54 -07:00
test_lora_manager.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
test_lora.py [Experimental] Add multi-LoRA support (#1804) 2024-01-23 15:26:37 -08:00
test_mixtral.py [Core] Add MultiprocessingGPUExecutor (#4539) 2024-05-14 10:38:59 -07:00
test_punica.py [Kernel] Add punica dimension for Qwen1.5-32B LoRA (#4850) 2024-05-16 11:16:09 -07:00
test_quant_model.py [Core] Support LoRA on quantized models (#4012) 2024-04-11 21:02:44 -07:00
test_tokenizer_group.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
test_utils.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
test_worker.py [Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
utils.py [Experimental] Add multi-LoRA support (#1804) 2024-01-23 15:26:37 -08:00