Lily Liu
|
43c413ec57
|
[Kernel] Use flashinfer for decoding (#4353)
Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>
|
2024-05-03 15:51:27 -07:00 |
|
SangBin Cho
|
0f8a91401c
|
[Core] Ignore infeasible swap requests. (#4557)
|
2024-05-02 14:31:20 -07:00 |
|
SangBin Cho
|
0d62fe58db
|
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
|
2024-05-01 19:24:13 -07:00 |
|
SangBin Cho
|
36729bac13
|
[Test] Test multiple attn backend for chunked prefill. (#4023)
|
2024-04-12 09:56:57 -07:00 |
|
SangBin Cho
|
e42df7227d
|
[Test] Add xformer and flash attn tests (#3961)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-11 03:09:50 +00:00 |
|
SangBin Cho
|
67b4221a61
|
[Core][5/N] Fully working chunked prefill e2e (#3884)
|
2024-04-10 17:56:48 -07:00 |
|
SangBin Cho
|
26422e477b
|
[Test] Make model tests run again and remove --forked from pytest (#3631)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-03-28 21:06:40 -07:00 |
|
SangBin Cho
|
6e435de766
|
[1/n][Chunked Prefill] Refactor input query shapes (#3236)
|
2024-03-20 14:46:05 -07:00 |
|
Zhuohan Li
|
a61f0521b8
|
[Test] Add basic correctness test (#2908)
|
2024-02-18 16:44:50 -08:00 |
|