vllm/vllm at d03d64fd2e22f1a48e7b78c66d7644e6b6230fb7 - vllm

History

youkaichao c391e4b68e [Core] improve robustness of pynccl (#3860 )		2024-04-04 16:52:12 -07:00
..
attention	[Bugfix] Add kv_scale input parameter to CPU backend (#3840 )	2024-04-04 04:33:08 +00:00
core	[3/N] Refactor scheduler for chunked prefill scheduling (#3550 )	2024-04-03 14:13:49 -07:00
engine	[Core] [Frontend] Make detokenization optional (#3749 )	2024-04-03 21:52:18 -07:00
entrypoints	[Frontend][Bugfix] allow using the default middleware with a root path (#3788 )	2024-04-02 01:20:28 -07:00
executor	[Speculative decoding] Adding configuration object for speculative decoding (#3706 )	2024-04-03 00:40:57 +00:00
lora	[BugFix] Use consistent logger everywhere (#3738 )	2024-03-29 23:26:44 +00:00
model_executor	[Core] improve robustness of pynccl (#3860 )	2024-04-04 16:52:12 -07:00
spec_decode	[Bugfix] Add `__init__.py` files for `vllm/core/block/` and `vllm/spec_decode/` (#3798 )	2024-04-02 12:35:31 -07:00
transformers_utils	[BugFix] Pass tokenizer_config to local_tokenizer_group (#3754 )	2024-04-03 20:31:46 -07:00
usage	usage lib get version another way (#3735 )	2024-03-29 15:57:08 -07:00
worker	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00
__init__.py	[CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary (#3803 )	2024-04-02 12:57:04 -07:00
block.py	Add Automatic Prefix Caching (#2762 )	2024-03-02 00:50:01 -08:00
config.py	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00
logger.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
outputs.py	[BugFix][Frontend] Fix completion logprobs=0 error (#3731 )	2024-03-29 09:38:21 -07:00
py.typed	Add py.typed so consumers of vLLM can get type checking (#1509 )	2023-10-30 14:50:47 -07:00
sampling_params.py	[Core] [Frontend] Make detokenization optional (#3749 )	2024-04-03 21:52:18 -07:00
sequence.py	[2/N] Chunked prefill data update (#3538 )	2024-03-28 10:06:01 -07:00
test_utils.py	[Core][Test] move local_rank to the last arg with default value(#3711 )	2024-03-28 21:19:45 -07:00
utils.py	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00