Commit Graph

5 Commits

Author SHA1 Message Date
Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Lily Liu
775f00f81e
[Speculative Decoding] Test refactor (#8317)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-11 14:07:34 -07:00
Nick Hill
5cf9254a9c
[BugFix] Fix use of per-request seed with pipeline parallel (#6698) 2024-07-30 10:40:08 -07:00
Nick Hill
2cf0df3381
[Bugfix] Fix speculative decode seeded test (#6743) 2024-07-24 08:58:31 -07:00
Thomas Parnell
d4201e06d5
[Bugfix] Make spec. decode respect per-request seed. (#6034)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-07-18 19:22:08 -07:00