| .. |
|
attention
|
[Misc] Add get_name method to attention backends (#4685)
|
2024-05-08 09:59:31 -07:00 |
|
core
|
[Core][Optimization] change copy-on-write from dict[int, list] to list (#4648)
|
2024-05-07 11:06:32 -07:00 |
|
distributed
|
[Core][Distributed] support cpu&device in broadcast tensor dict (#4660)
|
2024-05-07 19:34:47 -07:00 |
|
engine
|
[Bugfix] Fix asyncio.Task not being subscriptable (#4623)
|
2024-05-06 09:31:05 -07:00 |
|
entrypoints
|
[Bugfix] Fix asyncio.Task not being subscriptable (#4623)
|
2024-05-06 09:31:05 -07:00 |
|
executor
|
[Bug fix][Core] fixup ngram not setup correctly (#4551)
|
2024-05-07 11:40:18 -07:00 |
|
logging
|
[MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273)
|
2024-05-01 17:34:40 -07:00 |
|
lora
|
[Core] Faster startup for LoRA enabled models (#4634)
|
2024-05-08 10:33:18 -07:00 |
|
model_executor
|
[CI] Make mistral tests pass (#4596)
|
2024-05-08 08:44:35 -07:00 |
|
spec_decode
|
[Bug fix][Core] fixup ngram not setup correctly (#4551)
|
2024-05-07 11:40:18 -07:00 |
|
transformers_utils
|
[Misc] centralize all usage of environment variables (#4548)
|
2024-05-02 11:13:25 -07:00 |
|
usage
|
[Misc] centralize all usage of environment variables (#4548)
|
2024-05-02 11:13:25 -07:00 |
|
worker
|
[Core] Faster startup for LoRA enabled models (#4634)
|
2024-05-08 10:33:18 -07:00 |
|
__init__.py
|
bump version to v0.4.2 (#4600)
|
2024-05-04 17:09:49 -07:00 |
|
_custom_ops.py
|
[Kernel] Use flashinfer for decoding (#4353)
|
2024-05-03 15:51:27 -07:00 |
|
block.py
|
Add Automatic Prefix Caching (#2762)
|
2024-03-02 00:50:01 -08:00 |
|
config.py
|
Disable cuda version check in vllm-openai image (#4530)
|
2024-05-05 16:58:55 -07:00 |
|
envs.py
|
[Misc] add installation time env vars (#4574)
|
2024-05-03 15:55:56 -07:00 |
|
logger.py
|
[Misc] centralize all usage of environment variables (#4548)
|
2024-05-02 11:13:25 -07:00 |
|
outputs.py
|
[BugFix] Fix handling of stop strings and stop token ids (#3672)
|
2024-04-11 15:34:12 -07:00 |
|
py.typed
|
Add py.typed so consumers of vLLM can get type checking (#1509)
|
2023-10-30 14:50:47 -07:00 |
|
sampling_params.py
|
[Bugfix] Use random seed if seed is -1 (#4531)
|
2024-05-01 10:41:17 -07:00 |
|
sequence.py
|
[Core][Optimization] change python dict to pytorch tensor (#4607)
|
2024-05-06 21:30:27 -07:00 |
|
test_utils.py
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
utils.py
|
Disable cuda version check in vllm-openai image (#4530)
|
2024-05-05 16:58:55 -07:00 |