vllm/vllm/lora
2024-06-30 20:07:34 -07:00
..
__init__.py [Experimental] Add multi-LoRA support (#1804) 2024-01-23 15:26:37 -08:00
fully_sharded_layers.py [Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (#5665) 2024-06-21 04:46:28 +00:00
layers.py [Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
lora.py [Model] Add base class for LoRA-supported models (#5018) 2024-06-27 16:03:04 +08:00
models.py [Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. (#5909) 2024-06-30 17:11:15 +00:00
punica.py [misc][cuda] use nvml to avoid accidentally cuda initialization (#6007) 2024-06-30 20:07:34 -07:00
request.py [Lora] Support long context lora (#4787) 2024-05-18 16:05:23 +09:00
utils.py [Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (#5665) 2024-06-21 04:46:28 +00:00
worker_manager.py [LoRA] Add support for pinning lora adapters in the LRU cache (#5603) 2024-06-21 15:42:46 -07:00