Commit Graph

193 Commits

Author SHA1 Message Date
Kevin H. Luu
65950e8f58
[ci] Entrypoints run upon changes in vllm/ (#7423)
Signed-off-by: kevin <kevin@anyscale.com>
2024-08-12 10:18:03 -07:00
Lily Liu
ec2affa8ae
[Kernel] Flashinfer correctness fix for v0.1.3 (#7319) 2024-08-12 07:59:17 +00:00
Kevin H. Luu
469b3bc538
[ci] Make building wheels per commit optional (#7278)
Signed-off-by: kevin <kevin@anyscale.com>
2024-08-07 11:34:25 -07:00
afeldman-nm
fd95e026e0
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-06 16:51:47 -04:00
Dipika Sikka
a3bbbfa1d8
[BugFix] Fix DeepSeek remote code (#7178) 2024-08-06 08:16:53 -07:00
Simon Mo
e3c664bfcb
[Build] Add initial conditional testing spec (#6841) 2024-08-05 17:39:22 -07:00
Kuntai Du
67d745cc68
[CI] Temporarily turn off H100 performance benchmark (#7104) 2024-08-02 23:52:44 -07:00
youkaichao
04e5583425
[ci][distributed] merge distributed test commands (#7097)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-02 21:33:53 -07:00
omkar kakarparthi
562e580abc
Update run-amd-test.sh (#7044) 2024-08-01 13:12:37 -07:00
Sage Moore
7e0861bd0b
[CI/Build] Update PyTorch to 2.4.0 (#6951)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-01 11:11:24 -07:00
Alexei-V-Ivanov-AMD
a72a424b3e
[Build/CI] Fixing Docker Hub quota issue. (#7043) 2024-08-01 11:07:37 -07:00
HandH1998
6512937de1
Support W4A8 quantization for vllm (#5218) 2024-07-31 07:55:21 -06:00
Cyrus Leung
f230cc2ca6
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836) 2024-07-31 10:38:45 +08:00
Cade Daniel
c32ab8be1a
[Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964) 2024-07-31 00:53:21 +00:00
Cade Daniel
fb4f530bf5
[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706) 2024-07-30 16:28:49 -07:00
Cade Daniel
79319cedfa
[Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965) 2024-07-30 16:28:05 -07:00
Simon Mo
40c27a7cbb
[Build] Temporarily Disable Kernels and LoRA tests (#6961) 2024-07-30 14:59:48 -07:00
Roger Wang
ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) 2024-07-27 09:54:14 +00:00
Joe
14dbd5a767
[Model] H2O Danube3-4b (#6451) 2024-07-26 20:47:50 -07:00
Sanger Steel
969d032265
[Bugfix]: Fix Tensorizer test failures (#6835) 2024-07-26 20:02:25 -07:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125) 2024-07-26 13:50:10 -07:00
Michael Goin
07278c37dd
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611) 2024-07-26 14:33:42 -04:00
Kevin H. Luu
2eb9f4ff26
[ci] Mark tensorizer as soft fail and separate from grouped test (#6810)
[ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-25 18:08:33 -07:00
Kuntai Du
6a1e25b151
[Doc] Add documentations for nightly benchmarks (#6412) 2024-07-25 11:57:16 -07:00
Robert Shaw
889da130e7
[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524)
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-07-25 09:46:04 -07:00
Alexei-V-Ivanov-AMD
b570811706
[Build/CI] Update run-amd-test.sh. Enable Docker Hub login. (#6711) 2024-07-24 05:01:14 -07:00
youkaichao
72fc704803
[build] relax wheel size limit (#6704) 2024-07-23 14:03:49 -07:00
Kevin H. Luu
69d5ae38dc
[ci] Use different sccache bucket for CUDA 11.8 wheel build (#6656)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-22 14:20:41 -07:00
Robert Shaw
9364f74eee
[ Kernel ] Enable fp8-marlin for fbgemm-fp8 models (#6606) 2024-07-20 18:50:10 +00:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543) 2024-07-20 09:39:07 -07:00
Robert Shaw
683e3cb9c4
[ Misc ] fbgemm checkpoints (#6559) 2024-07-20 09:36:57 -07:00
Robert Shaw
4cc24f01b1
[ Kernel ] Enable Dynamic Per Token fp8 (#6547) 2024-07-19 23:08:15 +00:00
Robert Shaw
dbe5588554
[ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515) 2024-07-18 22:39:18 -04:00
Nick Hill
b5672a112c
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-18 19:15:52 -07:00
Tyler Michael Smith
1689219ebf
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 (#6517) 2024-07-18 17:29:25 -07:00
youkaichao
f53b8f0d05
[ci][test] add correctness test for cpu offloading (#6549) 2024-07-18 23:41:06 +00:00
Rui Qiao
61e592747c
[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2024-07-17 22:27:09 -07:00
youkaichao
1c27d25fb5
[core][model] yet another cpu offload implementation (#6496)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-17 20:54:35 -07:00
youkaichao
09c2eb85dd
[ci][distributed] add pipeline parallel correctness test (#6410) 2024-07-16 15:44:22 -07:00
Cyrus Leung
d97011512e
[CI/Build] vLLM cache directory for images (#6444) 2024-07-15 23:12:25 -07:00
Woosuk Kwon
4552e37b55
[CI/Build][TPU] Add TPU CI test (#6277)
Co-authored-by: kevin <kevin@anyscale.com>
2024-07-15 14:31:16 -07:00
youkaichao
69672f116c
[core][distributed] simplify code to support pipeline parallel (#6406) 2024-07-14 21:20:51 -07:00
Robert Shaw
a754dc2cb9
[CI/Build] Cross python wheel (#6394) 2024-07-14 18:54:46 -07:00
Robert Shaw
73030b7dae
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423) 2024-07-14 21:38:42 +00:00
youkaichao
ccd3c04571
[ci][build] fix commit id (#6420)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-07-14 22:16:21 +08:00
Tyler Michael Smith
9dad5cc859
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384) 2024-07-14 13:37:19 +00:00
Robert Shaw
fb6af8bc08
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417) 2024-07-13 20:03:58 -07:00
Robert Shaw
babf52dade
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-07-13 10:21:37 +00:00
youkaichao
41708e5034
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
Simon Mo
6bc9710f6e
Fix release pipeline's dir permission (#6391) 2024-07-12 15:52:43 -07:00