| .. |
|
core
|
[Experimental] Add multi-LoRA support (#1804)
|
2024-01-23 15:26:37 -08:00 |
|
engine
|
Refactor Prometheus and Add Request Level Metrics (#2316)
|
2024-01-31 14:58:07 -08:00 |
|
entrypoints
|
fix python 3.8 syntax (#2716)
|
2024-02-01 14:00:58 -08:00 |
|
lora
|
Don't build punica kernels by default (#2605)
|
2024-01-26 15:19:19 -08:00 |
|
model_executor
|
Add Internlm2 (#2666)
|
2024-02-01 09:27:40 -08:00 |
|
transformers_utils
|
[Experimental] Add multi-LoRA support (#1804)
|
2024-01-23 15:26:37 -08:00 |
|
worker
|
Fixes assertion failure in prefix caching: the lora index mapping should respect prefix_len (#2688)
|
2024-01-31 18:00:13 +01:00 |
|
__init__.py
|
Bump up version to v0.3.0 (#2656)
|
2024-01-31 00:07:07 -08:00 |
|
block.py
|
[Experimental] Prefix Caching Support (#1669)
|
2024-01-17 16:32:10 -08:00 |
|
config.py
|
fix some bugs (#2689)
|
2024-01-31 10:09:23 -08:00 |
|
logger.py
|
[Fix] Fix duplicated logging messages (#1524)
|
2023-10-31 09:04:47 -07:00 |
|
outputs.py
|
[Experimental] Add multi-LoRA support (#1804)
|
2024-01-23 15:26:37 -08:00 |
|
prefix.py
|
[Experimental] Add multi-LoRA support (#1804)
|
2024-01-23 15:26:37 -08:00 |
|
py.typed
|
Add py.typed so consumers of vLLM can get type checking (#1509)
|
2023-10-30 14:50:47 -07:00 |
|
sampling_params.py
|
[Bugfix] fix crash if max_tokens=None (#2570)
|
2024-01-23 22:38:55 -08:00 |
|
sequence.py
|
Refactor Prometheus and Add Request Level Metrics (#2316)
|
2024-01-31 14:58:07 -08:00 |
|
test_utils.py
|
Implement custom all reduce kernels (#2192)
|
2024-01-27 12:46:35 -08:00 |
|
utils.py
|
Support FP8-E5M2 KV Cache (#2279)
|
2024-01-28 16:43:54 -08:00 |