vllm/vllm
youkaichao 63e7176f26
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
2024-04-10 15:33:30 -07:00
..
attention [Model][AMD] ROCm support for 256 head dims for Gemma (#3972) 2024-04-10 08:12:00 -07:00
core [Core] latency optimization (#3890) 2024-04-06 19:14:06 -07:00
distributed [Core][Refactor] move parallel_utils into vllm/distributed (#3950) 2024-04-10 15:33:30 -07:00
engine [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837) 2024-04-09 11:44:15 -07:00
entrypoints Add option to completion API to truncate prompt tokens (#3144) 2024-04-05 10:15:42 -07:00
executor [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837) 2024-04-09 11:44:15 -07:00
lora [Core][Refactor] move parallel_utils into vllm/distributed (#3950) 2024-04-10 15:33:30 -07:00
model_executor [Core][Refactor] move parallel_utils into vllm/distributed (#3950) 2024-04-10 15:33:30 -07:00
spec_decode [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837) 2024-04-09 11:44:15 -07:00
transformers_utils [BugFix] Pass tokenizer_config to local_tokenizer_group (#3754) 2024-04-03 20:31:46 -07:00
usage usage lib get version another way (#3735) 2024-03-29 15:57:08 -07:00
worker [Core][Refactor] move parallel_utils into vllm/distributed (#3950) 2024-04-10 15:33:30 -07:00
__init__.py [Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00
block.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
config.py [Bugfix] handle hf_config with architectures == None (#3982) 2024-04-10 22:28:25 +00:00
logger.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
outputs.py [BugFix][Frontend] Fix completion logprobs=0 error (#3731) 2024-03-29 09:38:21 -07:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py Add option to completion API to truncate prompt tokens (#3144) 2024-04-05 10:15:42 -07:00
sequence.py [Chunked Prefill][4/n] Chunked prefill scheduler. (#3853) 2024-04-05 10:17:58 -07:00
test_utils.py [Core][Refactor] move parallel_utils into vllm/distributed (#3950) 2024-04-10 15:33:30 -07:00
utils.py [Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable (#3955) 2024-04-10 04:49:11 +00:00