vllm/vllm at 8438e0569eaf8496aa3d41deb808f2c831b64ecf - vllm

History

youkaichao 8438e0569e [Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024 ) [Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (#4024)		2024-04-17 08:34:33 +00:00
..
attention	Fix triton compilation issue (#3984 )	2024-04-12 16:41:26 -07:00
core	[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894 )	2024-04-16 13:09:21 -07:00
distributed	[Core] avoid too many cuda context by caching p2p test (#4021 )	2024-04-13 23:40:21 -07:00
engine	[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024 )	2024-04-17 08:34:33 +00:00
entrypoints	LM Format Enforcer Guided Decoding Support (#3868 )	2024-04-16 05:54:57 +00:00
executor	[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024 )	2024-04-17 08:34:33 +00:00
lora	[Bugfix] Fix LoRA bug (#4032 )	2024-04-12 16:56:37 -07:00
model_executor	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
spec_decode	[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894 )	2024-04-16 13:09:21 -07:00
transformers_utils	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
usage	[mypy] Add mypy type annotation part 1 (#4006 )	2024-04-12 14:35:50 -07:00
worker	[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024 )	2024-04-17 08:34:33 +00:00
__init__.py	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
_custom_ops.py	[Misc] Add indirection layer for custom ops (#3913 )	2024-04-10 20:26:07 -07:00
block.py	Add Automatic Prefix Caching (#2762 )	2024-03-02 00:50:01 -08:00
config.py	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
logger.py	[Doc] Add typing hints / mypy types cleanup (#3816 )	2024-04-11 17:17:21 -07:00
outputs.py	[BugFix] Fix handling of stop strings and stop token ids (#3672 )	2024-04-11 15:34:12 -07:00
py.typed	Add py.typed so consumers of vLLM can get type checking (#1509 )	2023-10-30 14:50:47 -07:00
sampling_params.py	[mypy] Add mypy type annotation part 1 (#4006 )	2024-04-12 14:35:50 -07:00
sequence.py	[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894 )	2024-04-16 13:09:21 -07:00
test_utils.py	[Core][Refactor] move parallel_utils into vllm/distributed (#3950 )	2024-04-10 15:33:30 -07:00
utils.py	[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024 )	2024-04-17 08:34:33 +00:00