vllm/worker at e9d3aa04f6e55e2bb540f0810da97ddd0deebb13 - vllm

History

SangBin Cho 65bf2ac165 [Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681 ) This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend. It also refactors subquery_start_loc which was not refactored in the previous PR		2024-05-15 14:00:10 +09:00
..
__init__.py	[Speculative decoding 2/9] Multi-step worker for draft model (#2424 )	2024-01-21 16:31:47 -08:00
test_model_runner.py	[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681 )	2024-05-15 14:00:10 +09:00
test_swap.py	[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659 )	2024-05-08 12:07:05 -07:00