vllm/configs at 03dccc886ef7e5d0dd67512f3e9748ee00c21fb2 - vllm

History

SangBin Cho 2e9a2227ec [Lora] Support long context lora (#4787 ) Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files		2024-05-18 16:05:23 +09:00
..
__init__.py	[Model] Add support for DBRX (#3660 )	2024-03-27 13:01:46 -07:00
arctic.py	[Model] Snowflake arctic model implementation (#4652 )	2024-05-09 22:37:14 +00:00
chatglm.py	[Lora] Support long context lora (#4787 )	2024-05-18 16:05:23 +09:00
dbrx.py	[CI] Disable non-lazy string operation on logging (#4326 )	2024-04-26 00:16:58 -07:00
falcon.py	Add Falcon support (new) (#592 )	2023-08-02 14:04:39 -07:00
jais.py	[Mypy] Part 3 fix typing for nested directories for most of directory (#4161 )	2024-04-22 21:32:44 -07:00
mpt.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00