vllm/vllm
youkaichao 8438e0569e
[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024)
[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (#4024)
2024-04-17 08:34:33 +00:00
..
attention Fix triton compilation issue (#3984) 2024-04-12 16:41:26 -07:00
core [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) 2024-04-16 13:09:21 -07:00
distributed [Core] avoid too many cuda context by caching p2p test (#4021) 2024-04-13 23:40:21 -07:00
engine [Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024) 2024-04-17 08:34:33 +00:00
entrypoints LM Format Enforcer Guided Decoding Support (#3868) 2024-04-16 05:54:57 +00:00
executor [Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024) 2024-04-17 08:34:33 +00:00
lora [Bugfix] Fix LoRA bug (#4032) 2024-04-12 16:56:37 -07:00
model_executor [Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
spec_decode [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) 2024-04-16 13:09:21 -07:00
transformers_utils [Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
usage [mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
worker [Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024) 2024-04-17 08:34:33 +00:00
__init__.py [Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00
_custom_ops.py [Misc] Add indirection layer for custom ops (#3913) 2024-04-10 20:26:07 -07:00
block.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
config.py [Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
logger.py [Doc] Add typing hints / mypy types cleanup (#3816) 2024-04-11 17:17:21 -07:00
outputs.py [BugFix] Fix handling of stop strings and stop token ids (#3672) 2024-04-11 15:34:12 -07:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py [mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
sequence.py [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) 2024-04-16 13:09:21 -07:00
test_utils.py [Core][Refactor] move parallel_utils into vllm/distributed (#3950) 2024-04-10 15:33:30 -07:00
utils.py [Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024) 2024-04-17 08:34:33 +00:00