vllm/lora at 27ce85476e6b170c5c90c65ac5c3268911135766 - vllm

History

SangBin Cho 2e9a2227ec [Lora] Support long context lora (#4787 ) Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files		2024-05-18 16:05:23 +09:00
..
data	[Lora] Support long context lora (#4787 )	2024-05-18 16:05:23 +09:00
__init__.py	[Experimental] Add multi-LoRA support (#1804 )	2024-01-23 15:26:37 -08:00
conftest.py	[Lora] Support long context lora (#4787 )	2024-05-18 16:05:23 +09:00
test_baichuan.py	[Kernel] Add punica dimension for Baichuan-13B (#4053 )	2024-04-13 07:55:05 -07:00
test_chatglm3.py	Enable more models to inference based on LoRA (#3382 )	2024-03-25 18:09:31 -07:00
test_gemma.py	Add LoRA support for Gemma (#3050 )	2024-02-28 13:03:28 -08:00
test_layer_variation.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
test_layers.py	[Lora] Support long context lora (#4787 )	2024-05-18 16:05:23 +09:00
test_llama.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
test_long_context.py	[Lora] Support long context lora (#4787 )	2024-05-18 16:05:23 +09:00
test_lora_checkpoints.py	[Bugfix] Fix LoRA loading check (#4138 )	2024-04-19 00:59:54 -07:00
test_lora_manager.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
test_lora.py	[Experimental] Add multi-LoRA support (#1804 )	2024-01-23 15:26:37 -08:00
test_mixtral.py	[Core] Add MultiprocessingGPUExecutor (#4539 )	2024-05-14 10:38:59 -07:00
test_punica.py	[Kernel] Add punica dimension for Qwen1.5-32B LoRA (#4850 )	2024-05-16 11:16:09 -07:00
test_quant_model.py	[Core] Support LoRA on quantized models (#4012 )	2024-04-11 21:02:44 -07:00
test_tokenizer_group.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
test_utils.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
test_worker.py	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
utils.py	[Experimental] Add multi-LoRA support (#1804 )	2024-01-23 15:26:37 -08:00