Commit Graph

35 Commits

Author SHA1 Message Date
SangBin Cho
67b4221a61
[Core][5/N] Fully working chunked prefill e2e (#3884) 2024-04-10 17:56:48 -07:00
youkaichao
95baec828f
[Core] enable out-of-tree model register (#3871) 2024-04-06 17:11:41 -07:00
youkaichao
d03d64fd2e
[CI/Build] refactor dockerfile & fix pip cache
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859)
2024-04-04 21:53:16 -07:00
bigPYJ1151
77a6572aa5
[HotFix] [CI/Build] Minor fix for CPU backend CI (#3787) 2024-04-01 22:50:53 -07:00
bigPYJ1151
0e3f06fe9c
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00
yhu422
d8658c8cc1
Usage Stats Collection (#2852) 2024-03-28 22:16:12 -07:00
SangBin Cho
26422e477b
[Test] Make model tests run again and remove --forked from pytest (#3631)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-03-28 21:06:40 -07:00
Simon Mo
a4075cba4d
[CI] Add test case to run examples scripts (#3638) 2024-03-28 14:36:10 -07:00
Simon Mo
96aa014d1e
fix benchmark format reporting in buildkite (#3693) 2024-03-28 14:35:16 -07:00
Roger Wang
45b6ef6513
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277) 2024-03-27 13:39:26 -07:00
youkaichao
8f44facddd
[Core] remove cupy dependency (#3625) 2024-03-27 00:33:26 -07:00
xwjiang2010
64172a976c
[Feature] Add vision language model support. (#3042) 2024-03-25 14:16:30 -07:00
Roy
f1c0fc3919
Migrate logits computation and gather to model_runner (#3233) 2024-03-20 23:25:01 +00:00
SangBin Cho
6e435de766
[1/n][Chunked Prefill] Refactor input query shapes (#3236) 2024-03-20 14:46:05 -07:00
Cade Daniel
482b0adf1b
[Testing] Add test_config.py to CI (#3437) 2024-03-18 12:48:45 -07:00
Simon Mo
8c654c045f
CI: Add ROCm Docker Build (#2886) 2024-03-18 19:33:47 +00:00
Simon Mo
93348d9458
[CI] Shard tests for LoRA and Kernels to speed up (#3445) 2024-03-17 14:56:30 -07:00
Antoni Baum
fb96c1e98c
Asynchronous tokenization (#2879) 2024-03-15 23:37:01 +00:00
Simon Mo
81653d9688
[Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion (#3383) 2024-03-13 17:02:21 -07:00
Cade Daniel
8437bae6ef
[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103) 2024-03-08 23:32:46 -08:00
SangBin Cho
24aecf421a
[Tests] Add block manager and scheduler tests (#3108) 2024-03-05 18:23:34 -08:00
Woosuk Kwon
929b4f2973
Add LoRA support for Gemma (#3050) 2024-02-28 13:03:28 -08:00
Ronen Schaffer
4caf7044e0
Include tokens from prompt phase in counter_generation_tokens (#2802) 2024-02-22 14:00:12 -08:00
Zhuohan Li
a61f0521b8
[Test] Add basic correctness test (#2908) 2024-02-18 16:44:50 -08:00
Simon Mo
f964493274
[CI] Ensure documentation build is checked in CI (#2842) 2024-02-12 22:53:07 -08:00
Roger Wang
a4211a4dc3
Serving Benchmark Refactoring (#2433) 2024-02-12 22:53:00 -08:00
Woosuk Kwon
f8ecb84c02
Speed up Punica compilation (#2632) 2024-01-27 17:46:56 -08:00
Antoni Baum
9b945daaf1
[Experimental] Add multi-LoRA support (#1804)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
2024-01-23 15:26:37 -08:00
Simon Mo
00efdc84ba
Add benchmark serving to CI (#2505) 2024-01-19 20:20:19 -08:00
shiyi.c_98
d10f8e1d43
[Experimental] Prefix Caching Support (#1669)
Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-17 16:32:10 -08:00
FlorianJoncour
14cc317ba4
OpenAI Server refactoring (#2360) 2024-01-16 21:33:14 -08:00
Simon Mo
8cd5a992bf
ci: retry on build failure as well (#2457) 2024-01-16 12:51:04 -08:00
Simon Mo
947f0b23cc
CI: make sure benchmark script exit on error (#2449) 2024-01-16 09:50:13 -08:00
Simon Mo
bfc072addf
Allow buildkite to retry build on agent lost (#2446) 2024-01-15 15:43:15 -08:00
Simon Mo
6e01e8c1c8
[CI] Add Buildkite (#2355) 2024-01-14 12:37:58 -08:00