Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default ( #8704 )
...
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
sroy745
f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 ( #9149 )
2024-10-10 14:17:17 +08:00
sroy745
91add85ec4
Fix failing spec decode test ( #9054 )
2024-10-03 23:07:29 +00:00
Cody Yu
973617ae02
[Speculative decoding][Re-take] Enable TP>1 speculative decoding ( #4840 )
...
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Cade Daniel <cade@anyscale.com>
2024-05-16 00:53:51 -07:00
leiwen83
4bb53e2dde
[BugFix] fix num_lookahead_slots missing in async executor ( #4165 )
...
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
2024-04-30 10:12:59 -07:00
Cade Daniel
62b8aebc6f
[Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. ( #3951 )
2024-04-23 08:02:36 +00:00