| .. |
|
attention
|
[Model][AMD] ROCm support for 256 head dims for Gemma (#3972)
|
2024-04-10 08:12:00 -07:00 |
|
core
|
[Core] latency optimization (#3890)
|
2024-04-06 19:14:06 -07:00 |
|
distributed
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
engine
|
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
|
2024-04-09 11:44:15 -07:00 |
|
entrypoints
|
Add option to completion API to truncate prompt tokens (#3144)
|
2024-04-05 10:15:42 -07:00 |
|
executor
|
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
|
2024-04-09 11:44:15 -07:00 |
|
lora
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
model_executor
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
spec_decode
|
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
|
2024-04-09 11:44:15 -07:00 |
|
transformers_utils
|
[BugFix] Pass tokenizer_config to local_tokenizer_group (#3754)
|
2024-04-03 20:31:46 -07:00 |
|
usage
|
usage lib get version another way (#3735)
|
2024-03-29 15:57:08 -07:00 |
|
worker
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
__init__.py
|
[Core] enable out-of-tree model register (#3871)
|
2024-04-06 17:11:41 -07:00 |
|
block.py
|
Add Automatic Prefix Caching (#2762)
|
2024-03-02 00:50:01 -08:00 |
|
config.py
|
[Bugfix] handle hf_config with architectures == None (#3982)
|
2024-04-10 22:28:25 +00:00 |
|
logger.py
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
outputs.py
|
[BugFix][Frontend] Fix completion logprobs=0 error (#3731)
|
2024-03-29 09:38:21 -07:00 |
|
py.typed
|
Add py.typed so consumers of vLLM can get type checking (#1509)
|
2023-10-30 14:50:47 -07:00 |
|
sampling_params.py
|
Add option to completion API to truncate prompt tokens (#3144)
|
2024-04-05 10:15:42 -07:00 |
|
sequence.py
|
[Chunked Prefill][4/n] Chunked prefill scheduler. (#3853)
|
2024-04-05 10:17:58 -07:00 |
|
test_utils.py
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
utils.py
|
[Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable (#3955)
|
2024-04-10 04:49:11 +00:00 |