vllm/vllm
2024-05-06 09:31:05 -07:00
..
attention [Kernel] Use flashinfer for decoding (#4353) 2024-05-03 15:51:27 -07:00
core [Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
distributed [Core][Distributed] enable allreduce for multiple tp groups (#4566) 2024-05-02 17:32:33 -07:00
engine [Bugfix] Fix asyncio.Task not being subscriptable (#4623) 2024-05-06 09:31:05 -07:00
entrypoints [Bugfix] Fix asyncio.Task not being subscriptable (#4623) 2024-05-06 09:31:05 -07:00
executor [Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
logging [MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273) 2024-05-01 17:34:40 -07:00
lora [Kernel] Full Tensor Parallelism for LoRA Layers (#3524) 2024-04-27 00:03:48 -07:00
model_executor [Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations) (#4527) 2024-05-04 11:45:16 -07:00
spec_decode [Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
transformers_utils [Misc] centralize all usage of environment variables (#4548) 2024-05-02 11:13:25 -07:00
usage [Misc] centralize all usage of environment variables (#4548) 2024-05-02 11:13:25 -07:00
worker [Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
__init__.py bump version to v0.4.2 (#4600) 2024-05-04 17:09:49 -07:00
_custom_ops.py [Kernel] Use flashinfer for decoding (#4353) 2024-05-03 15:51:27 -07:00
block.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
config.py Disable cuda version check in vllm-openai image (#4530) 2024-05-05 16:58:55 -07:00
envs.py [Misc] add installation time env vars (#4574) 2024-05-03 15:55:56 -07:00
logger.py [Misc] centralize all usage of environment variables (#4548) 2024-05-02 11:13:25 -07:00
outputs.py [BugFix] Fix handling of stop strings and stop token ids (#3672) 2024-04-11 15:34:12 -07:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py [Bugfix] Use random seed if seed is -1 (#4531) 2024-05-01 10:41:17 -07:00
sequence.py [Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
test_utils.py [Core][Refactor] move parallel_utils into vllm/distributed (#3950) 2024-04-10 15:33:30 -07:00
utils.py Disable cuda version check in vllm-openai image (#4530) 2024-05-05 16:58:55 -07:00