vllm/core at 4238bc82f24d5887784b04a353ed93e2360623b4 - vllm

History

afeldman-nm 4238bc82f2 [Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837 )		2024-05-29 16:09:13 +00:00
..
block	[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837 )	2024-05-29 16:09:13 +00:00
__init__.py	[Tests] Add block manager and scheduler tests (#3108 )	2024-03-05 18:23:34 -08:00
test_block_manager.py	[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837 )	2024-05-29 16:09:13 +00:00
test_chunked_prefill_scheduler.py	[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659 )	2024-05-08 12:07:05 -07:00
test_scheduler.py	[Scheduler] Warning upon preemption and Swapping (#4647 )	2024-05-13 23:50:44 +09:00
utils.py	[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837 )	2024-05-29 16:09:13 +00:00