| .. |
|
core
|
Support per-request seed (#2514)
|
2024-02-21 11:47:00 -08:00 |
|
engine
|
Support per-request seed (#2514)
|
2024-02-21 11:47:00 -08:00 |
|
entrypoints
|
Added early stopping to completion APIs (#2939)
|
2024-02-21 18:24:01 -08:00 |
|
lora
|
[BugFix] Fix GC bug for LLM class (#2882)
|
2024-02-14 22:17:44 -08:00 |
|
model_executor
|
Use Llama RMSNorm custom op for Gemma (#2974)
|
2024-02-21 18:28:23 -08:00 |
|
transformers_utils
|
Support OLMo models. (#2832)
|
2024-02-18 21:05:15 -08:00 |
|
worker
|
Support per-request seed (#2514)
|
2024-02-21 11:47:00 -08:00 |
|
__init__.py
|
Bump up version to v0.3.2 (#2968)
|
2024-02-21 11:47:25 -08:00 |
|
block.py
|
[Experimental] Prefix Caching Support (#1669)
|
2024-01-17 16:32:10 -08:00 |
|
config.py
|
Add code-revision config argument for Hugging Face Hub (#2892)
|
2024-02-17 22:36:53 -08:00 |
|
logger.py
|
Make vLLM logging formatting optional (#2877)
|
2024-02-20 14:38:55 -08:00 |
|
outputs.py
|
Add metrics to RequestOutput (#2876)
|
2024-02-20 21:55:57 -08:00 |
|
prefix.py
|
[Experimental] Add multi-LoRA support (#1804)
|
2024-01-23 15:26:37 -08:00 |
|
py.typed
|
Add py.typed so consumers of vLLM can get type checking (#1509)
|
2023-10-30 14:50:47 -07:00 |
|
sampling_params.py
|
Support per-request seed (#2514)
|
2024-02-21 11:47:00 -08:00 |
|
sequence.py
|
Support per-request seed (#2514)
|
2024-02-21 11:47:00 -08:00 |
|
test_utils.py
|
Use CuPy for CUDA graphs (#2811)
|
2024-02-13 11:32:06 -08:00 |
|
utils.py
|
[Minor] More fix of test_cache.py CI test failure (#2750)
|
2024-02-06 11:38:38 -08:00 |