vllm/vllm
2024-05-08 10:33:18 -07:00
..
attention [Misc] Add get_name method to attention backends (#4685) 2024-05-08 09:59:31 -07:00
core [Core][Optimization] change copy-on-write from dict[int, list] to list (#4648) 2024-05-07 11:06:32 -07:00
distributed [Core][Distributed] support cpu&device in broadcast tensor dict (#4660) 2024-05-07 19:34:47 -07:00
engine [Bugfix] Fix asyncio.Task not being subscriptable (#4623) 2024-05-06 09:31:05 -07:00
entrypoints [Bugfix] Fix asyncio.Task not being subscriptable (#4623) 2024-05-06 09:31:05 -07:00
executor [Bug fix][Core] fixup ngram not setup correctly (#4551) 2024-05-07 11:40:18 -07:00
logging [MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273) 2024-05-01 17:34:40 -07:00
lora [Core] Faster startup for LoRA enabled models (#4634) 2024-05-08 10:33:18 -07:00
model_executor [CI] Make mistral tests pass (#4596) 2024-05-08 08:44:35 -07:00
spec_decode [Bug fix][Core] fixup ngram not setup correctly (#4551) 2024-05-07 11:40:18 -07:00
transformers_utils [Misc] centralize all usage of environment variables (#4548) 2024-05-02 11:13:25 -07:00
usage [Misc] centralize all usage of environment variables (#4548) 2024-05-02 11:13:25 -07:00
worker [Core] Faster startup for LoRA enabled models (#4634) 2024-05-08 10:33:18 -07:00
__init__.py bump version to v0.4.2 (#4600) 2024-05-04 17:09:49 -07:00
_custom_ops.py [Kernel] Use flashinfer for decoding (#4353) 2024-05-03 15:51:27 -07:00
block.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
config.py Disable cuda version check in vllm-openai image (#4530) 2024-05-05 16:58:55 -07:00
envs.py [Misc] add installation time env vars (#4574) 2024-05-03 15:55:56 -07:00
logger.py [Misc] centralize all usage of environment variables (#4548) 2024-05-02 11:13:25 -07:00
outputs.py [BugFix] Fix handling of stop strings and stop token ids (#3672) 2024-04-11 15:34:12 -07:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py [Bugfix] Use random seed if seed is -1 (#4531) 2024-05-01 10:41:17 -07:00
sequence.py [Core][Optimization] change python dict to pytorch tensor (#4607) 2024-05-06 21:30:27 -07:00
test_utils.py [Core][Refactor] move parallel_utils into vllm/distributed (#3950) 2024-04-10 15:33:30 -07:00
utils.py Disable cuda version check in vllm-openai image (#4530) 2024-05-05 16:58:55 -07:00