| .. |
|
attention
|
[Bugfix] Add kv_scale input parameter to CPU backend (#3840)
|
2024-04-04 04:33:08 +00:00 |
|
core
|
[3/N] Refactor scheduler for chunked prefill scheduling (#3550)
|
2024-04-03 14:13:49 -07:00 |
|
engine
|
[Core] [Frontend] Make detokenization optional (#3749)
|
2024-04-03 21:52:18 -07:00 |
|
entrypoints
|
[Frontend][Bugfix] allow using the default middleware with a root path (#3788)
|
2024-04-02 01:20:28 -07:00 |
|
executor
|
[Speculative decoding] Adding configuration object for speculative decoding (#3706)
|
2024-04-03 00:40:57 +00:00 |
|
lora
|
[BugFix] Use consistent logger everywhere (#3738)
|
2024-03-29 23:26:44 +00:00 |
|
model_executor
|
[Core] improve robustness of pynccl (#3860)
|
2024-04-04 16:52:12 -07:00 |
|
spec_decode
|
[Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ (#3798)
|
2024-04-02 12:35:31 -07:00 |
|
transformers_utils
|
[BugFix] Pass tokenizer_config to local_tokenizer_group (#3754)
|
2024-04-03 20:31:46 -07:00 |
|
usage
|
usage lib get version another way (#3735)
|
2024-03-29 15:57:08 -07:00 |
|
worker
|
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
|
2024-04-03 14:15:55 -07:00 |
|
__init__.py
|
[CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary (#3803)
|
2024-04-02 12:57:04 -07:00 |
|
block.py
|
Add Automatic Prefix Caching (#2762)
|
2024-03-02 00:50:01 -08:00 |
|
config.py
|
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
|
2024-04-03 14:15:55 -07:00 |
|
logger.py
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
outputs.py
|
[BugFix][Frontend] Fix completion logprobs=0 error (#3731)
|
2024-03-29 09:38:21 -07:00 |
|
py.typed
|
Add py.typed so consumers of vLLM can get type checking (#1509)
|
2023-10-30 14:50:47 -07:00 |
|
sampling_params.py
|
[Core] [Frontend] Make detokenization optional (#3749)
|
2024-04-03 21:52:18 -07:00 |
|
sequence.py
|
[2/N] Chunked prefill data update (#3538)
|
2024-03-28 10:06:01 -07:00 |
|
test_utils.py
|
[Core][Test] move local_rank to the last arg with default value(#3711)
|
2024-03-28 21:19:45 -07:00 |
|
utils.py
|
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
|
2024-04-03 14:15:55 -07:00 |