vllm/vllm at c013d32c758699fbe5804af1b9d9408acd6cb8b7 - vllm

History

Jee Li 11dd6ebb89 [Misc] Avoid loading incorrect LoRA config (#3777 )		2024-04-09 19:47:15 -07:00
..
attention	[ROCm][Hardware][AMD] Use Triton Kernel for default FA on ROCm (#3643 )	2024-04-09 15:10:47 -07:00
core	[Core] latency optimization (#3890 )	2024-04-06 19:14:06 -07:00
engine	[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837 )	2024-04-09 11:44:15 -07:00
entrypoints	Add option to completion API to truncate prompt tokens (#3144 )	2024-04-05 10:15:42 -07:00
executor	[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837 )	2024-04-09 11:44:15 -07:00
lora	[Misc] Avoid loading incorrect LoRA config (#3777 )	2024-04-09 19:47:15 -07:00
model_executor	[Bugfix] Fix KeyError on loading GPT-NeoX (#3925 )	2024-04-09 12:11:31 -07:00
spec_decode	[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837 )	2024-04-09 11:44:15 -07:00
transformers_utils	[BugFix] Pass tokenizer_config to local_tokenizer_group (#3754 )	2024-04-03 20:31:46 -07:00
usage	usage lib get version another way (#3735 )	2024-03-29 15:57:08 -07:00
worker	[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837 )	2024-04-09 11:44:15 -07:00
__init__.py	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
block.py	Add Automatic Prefix Caching (#2762 )	2024-03-02 00:50:01 -08:00
config.py	[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837 )	2024-04-09 11:44:15 -07:00
logger.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
outputs.py	[BugFix][Frontend] Fix completion logprobs=0 error (#3731 )	2024-03-29 09:38:21 -07:00
py.typed	Add py.typed so consumers of vLLM can get type checking (#1509 )	2023-10-30 14:50:47 -07:00
sampling_params.py	Add option to completion API to truncate prompt tokens (#3144 )	2024-04-05 10:15:42 -07:00
sequence.py	[Chunked Prefill][4/n] Chunked prefill scheduler. (#3853 )	2024-04-05 10:17:58 -07:00
test_utils.py	[Core] separate distributed_init from worker (#3904 )	2024-04-09 08:49:02 +00:00
utils.py	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00