youkaichao
|
469f85c782
|
[Core][Optimization] change copy-on-write from dict[int, list] to list (#4648)
|
2024-05-07 11:06:32 -07:00 |
|
leiwen83
|
24750f4cad
|
[Core] Enable prefix caching with block manager v2 enabled (#4142)
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Sage Moore <sagemoore@utexas.edu>
|
2024-05-01 11:20:32 -07:00 |
|
Cade Daniel
|
e95cd87959
|
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894)
|
2024-04-16 13:09:21 -07:00 |
|
Cade Daniel
|
e7c7067b45
|
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
|
2024-04-09 11:44:15 -07:00 |
|
Cade Daniel
|
eb69d68804
|
[Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup (#3783)
|
2024-04-02 00:49:51 +00:00 |
|
Cade Daniel
|
93deb0b38f
|
[Speculative decoding 4/9] Lookahead scheduling for speculative decoding (#3250)
|
2024-04-01 22:55:24 +00:00 |
|
Cade Daniel
|
14ccd94c89
|
[Core][Bugfix]Refactor block manager for better testability (#3492)
|
2024-03-27 23:59:28 -07:00 |
|