vllm/core at cb3b2b9ba4a95c413a879e30e2b8674187519a93 - vllm

History

Varun Sundar Rabindranath cb3b2b9ba4 [Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>		2024-10-06 12:48:11 -07:00
..
block	[Bugfix] Block manager v2 with preemption and lookahead slots (#8824 )	2024-09-29 09:17:45 +08:00
__init__.py	[Tests] Add block manager and scheduler tests (#3108 )	2024-03-05 18:23:34 -08:00
test_block_manager.py	[Performance] Enable chunked prefill and prefix caching together (#7753 )	2024-08-28 00:36:31 -07:00
test_chunked_prefill_scheduler.py	Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752 )	2024-09-24 21:26:36 -07:00
test_num_computed_tokens_update.py	[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038 )	2024-10-06 12:48:11 -07:00
test_scheduler_encoder_decoder.py	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )	2024-08-06 16:51:47 -04:00
test_scheduler.py	Fix test_schedule_swapped_simple in test_scheduler.py (#8780 )	2024-09-24 21:26:18 -07:00
test_serialization.py	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
utils.py	[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038 )	2024-10-06 12:48:11 -07:00