bigPYJ1151
|
77a6572aa5
|
[HotFix] [CI/Build] Minor fix for CPU backend CI (#3787)
|
2024-04-01 22:50:53 -07:00 |
|
bigPYJ1151
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|
yhu422
|
d8658c8cc1
|
Usage Stats Collection (#2852)
|
2024-03-28 22:16:12 -07:00 |
|
SangBin Cho
|
26422e477b
|
[Test] Make model tests run again and remove --forked from pytest (#3631)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-03-28 21:06:40 -07:00 |
|
Simon Mo
|
a4075cba4d
|
[CI] Add test case to run examples scripts (#3638)
|
2024-03-28 14:36:10 -07:00 |
|
Simon Mo
|
96aa014d1e
|
fix benchmark format reporting in buildkite (#3693)
|
2024-03-28 14:35:16 -07:00 |
|
Roger Wang
|
45b6ef6513
|
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277)
|
2024-03-27 13:39:26 -07:00 |
|
youkaichao
|
8f44facddd
|
[Core] remove cupy dependency (#3625)
|
2024-03-27 00:33:26 -07:00 |
|
xwjiang2010
|
64172a976c
|
[Feature] Add vision language model support. (#3042)
|
2024-03-25 14:16:30 -07:00 |
|
Roy
|
f1c0fc3919
|
Migrate logits computation and gather to model_runner (#3233)
|
2024-03-20 23:25:01 +00:00 |
|
SangBin Cho
|
6e435de766
|
[1/n][Chunked Prefill] Refactor input query shapes (#3236)
|
2024-03-20 14:46:05 -07:00 |
|
Cade Daniel
|
482b0adf1b
|
[Testing] Add test_config.py to CI (#3437)
|
2024-03-18 12:48:45 -07:00 |
|
Simon Mo
|
8c654c045f
|
CI: Add ROCm Docker Build (#2886)
|
2024-03-18 19:33:47 +00:00 |
|
Simon Mo
|
93348d9458
|
[CI] Shard tests for LoRA and Kernels to speed up (#3445)
|
2024-03-17 14:56:30 -07:00 |
|
Antoni Baum
|
fb96c1e98c
|
Asynchronous tokenization (#2879)
|
2024-03-15 23:37:01 +00:00 |
|
Simon Mo
|
81653d9688
|
[Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion (#3383)
|
2024-03-13 17:02:21 -07:00 |
|
Cade Daniel
|
8437bae6ef
|
[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103)
|
2024-03-08 23:32:46 -08:00 |
|
SangBin Cho
|
24aecf421a
|
[Tests] Add block manager and scheduler tests (#3108)
|
2024-03-05 18:23:34 -08:00 |
|
Woosuk Kwon
|
929b4f2973
|
Add LoRA support for Gemma (#3050)
|
2024-02-28 13:03:28 -08:00 |
|
Ronen Schaffer
|
4caf7044e0
|
Include tokens from prompt phase in counter_generation_tokens (#2802)
|
2024-02-22 14:00:12 -08:00 |
|
Zhuohan Li
|
a61f0521b8
|
[Test] Add basic correctness test (#2908)
|
2024-02-18 16:44:50 -08:00 |
|
Simon Mo
|
f964493274
|
[CI] Ensure documentation build is checked in CI (#2842)
|
2024-02-12 22:53:07 -08:00 |
|
Roger Wang
|
a4211a4dc3
|
Serving Benchmark Refactoring (#2433)
|
2024-02-12 22:53:00 -08:00 |
|
Woosuk Kwon
|
f8ecb84c02
|
Speed up Punica compilation (#2632)
|
2024-01-27 17:46:56 -08:00 |
|
Antoni Baum
|
9b945daaf1
|
[Experimental] Add multi-LoRA support (#1804)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
|
2024-01-23 15:26:37 -08:00 |
|
Simon Mo
|
00efdc84ba
|
Add benchmark serving to CI (#2505)
|
2024-01-19 20:20:19 -08:00 |
|
shiyi.c_98
|
d10f8e1d43
|
[Experimental] Prefix Caching Support (#1669)
Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2024-01-17 16:32:10 -08:00 |
|
FlorianJoncour
|
14cc317ba4
|
OpenAI Server refactoring (#2360)
|
2024-01-16 21:33:14 -08:00 |
|
Simon Mo
|
8cd5a992bf
|
ci: retry on build failure as well (#2457)
|
2024-01-16 12:51:04 -08:00 |
|
Simon Mo
|
947f0b23cc
|
CI: make sure benchmark script exit on error (#2449)
|
2024-01-16 09:50:13 -08:00 |
|
Simon Mo
|
bfc072addf
|
Allow buildkite to retry build on agent lost (#2446)
|
2024-01-15 15:43:15 -08:00 |
|
Simon Mo
|
6e01e8c1c8
|
[CI] Add Buildkite (#2355)
|
2024-01-14 12:37:58 -08:00 |
|