Commit Graph

15 Commits

Author SHA1 Message Date
Alexander Matveev
3476ed0809
[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602) 2024-07-01 20:10:37 -07:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
leiwen83
1b8a0d71cf
[Core][Bugfix]: fix prefix caching for blockv2 (#5364)
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
2024-06-14 17:23:56 -07:00
Kaiyang Chen
10c38e3e46
[Misc]: Implement CPU/GPU swapping in BlockManagerV2 (#3834) 2024-06-03 13:37:11 -07:00
afeldman-nm
4238bc82f2
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837) 2024-05-29 16:09:13 +00:00
Michał Moskal
d4f3985907
[Core] Sliding window for block manager v2 (#4545)
Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
2024-05-28 11:07:07 +09:00
leiwen83
e64fde4b01
[Core][Bugfix]: fix prefix caching for blockv2 (#4764)
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
2024-05-24 10:07:09 -07:00
Cyrus Leung
350f9e107f
[CI/Build] Move test_utils.py to tests/utils.py (#4425)
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)

Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
youkaichao
469f85c782
[Core][Optimization] change copy-on-write from dict[int, list] to list (#4648) 2024-05-07 11:06:32 -07:00
leiwen83
24750f4cad
[Core] Enable prefix caching with block manager v2 enabled (#4142)
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Sage Moore <sagemoore@utexas.edu>
2024-05-01 11:20:32 -07:00
Cade Daniel
e95cd87959
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) 2024-04-16 13:09:21 -07:00
Cade Daniel
e7c7067b45
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837) 2024-04-09 11:44:15 -07:00
Cade Daniel
eb69d68804
[Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup (#3783) 2024-04-02 00:49:51 +00:00
Cade Daniel
93deb0b38f
[Speculative decoding 4/9] Lookahead scheduling for speculative decoding (#3250) 2024-04-01 22:55:24 +00:00
Cade Daniel
14ccd94c89
[Core][Bugfix]Refactor block manager for better testability (#3492) 2024-03-27 23:59:28 -07:00