Cade Daniel
fb4f530bf5
[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists ( #6706 )
2024-07-30 16:28:49 -07:00
Cade Daniel
79319cedfa
[Nightly benchmarking suite] Remove pkill python from run benchmark suite ( #6965 )
2024-07-30 16:28:05 -07:00
Simon Mo
40c27a7cbb
[Build] Temporarily Disable Kernels and LoRA tests ( #6961 )
2024-07-30 14:59:48 -07:00
Roger Wang
ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes ( #6860 )
2024-07-27 09:54:14 +00:00
Joe
14dbd5a767
[Model] H2O Danube3-4b ( #6451 )
2024-07-26 20:47:50 -07:00
Sanger Steel
969d032265
[Bugfix]: Fix Tensorizer test failures ( #6835 )
2024-07-26 20:02:25 -07:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation ( #6125 )
2024-07-26 13:50:10 -07:00
Michael Goin
07278c37dd
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) ( #6611 )
2024-07-26 14:33:42 -04:00
Kevin H. Luu
2eb9f4ff26
[ci] Mark tensorizer as soft fail and separate from grouped test ( #6810 )
...
[ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810 )
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-25 18:08:33 -07:00
Kuntai Du
6a1e25b151
[Doc] Add documentations for nightly benchmarks ( #6412 )
2024-07-25 11:57:16 -07:00
Robert Shaw
889da130e7
[ Misc ] fp8-marlin channelwise via compressed-tensors ( #6524 )
...
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-07-25 09:46:04 -07:00
Alexei-V-Ivanov-AMD
b570811706
[Build/CI] Update run-amd-test.sh. Enable Docker Hub login. ( #6711 )
2024-07-24 05:01:14 -07:00
youkaichao
72fc704803
[build] relax wheel size limit ( #6704 )
2024-07-23 14:03:49 -07:00
Kevin H. Luu
69d5ae38dc
[ci] Use different sccache bucket for CUDA 11.8 wheel build ( #6656 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-22 14:20:41 -07:00
Robert Shaw
9364f74eee
[ Kernel ] Enable fp8-marlin for fbgemm-fp8 models ( #6606 )
2024-07-20 18:50:10 +00:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes ( #6543 )
2024-07-20 09:39:07 -07:00
Robert Shaw
683e3cb9c4
[ Misc ] fbgemm checkpoints ( #6559 )
2024-07-20 09:36:57 -07:00
Robert Shaw
4cc24f01b1
[ Kernel ] Enable Dynamic Per Token fp8 ( #6547 )
2024-07-19 23:08:15 +00:00
Robert Shaw
dbe5588554
[ Misc ] non-uniform quantization via compressed-tensors for Llama ( #6515 )
2024-07-18 22:39:18 -04:00
Nick Hill
b5672a112c
[Core] Multiprocessing Pipeline Parallel support ( #6130 )
...
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-18 19:15:52 -07:00
Tyler Michael Smith
1689219ebf
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 ( #6517 )
2024-07-18 17:29:25 -07:00
youkaichao
f53b8f0d05
[ci][test] add correctness test for cpu offloading ( #6549 )
2024-07-18 23:41:06 +00:00
Rui Qiao
61e592747c
[Core] Introduce SPMD worker execution using Ray accelerated DAG ( #6032 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2024-07-17 22:27:09 -07:00
youkaichao
1c27d25fb5
[core][model] yet another cpu offload implementation ( #6496 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-17 20:54:35 -07:00
youkaichao
09c2eb85dd
[ci][distributed] add pipeline parallel correctness test ( #6410 )
2024-07-16 15:44:22 -07:00
Cyrus Leung
d97011512e
[CI/Build] vLLM cache directory for images ( #6444 )
2024-07-15 23:12:25 -07:00
Woosuk Kwon
4552e37b55
[CI/Build][TPU] Add TPU CI test ( #6277 )
...
Co-authored-by: kevin <kevin@anyscale.com>
2024-07-15 14:31:16 -07:00
youkaichao
69672f116c
[core][distributed] simplify code to support pipeline parallel ( #6406 )
2024-07-14 21:20:51 -07:00
Robert Shaw
a754dc2cb9
[CI/Build] Cross python wheel ( #6394 )
2024-07-14 18:54:46 -07:00
Robert Shaw
73030b7dae
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 ( #6423 )
2024-07-14 21:38:42 +00:00
youkaichao
ccd3c04571
[ci][build] fix commit id ( #6420 )
...
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-07-14 22:16:21 +08:00
Tyler Michael Smith
9dad5cc859
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace ( #6384 )
2024-07-14 13:37:19 +00:00
Robert Shaw
fb6af8bc08
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 ( #6417 )
2024-07-13 20:03:58 -07:00
Robert Shaw
babf52dade
[ Misc ] More Cleanup of Marlin ( #6359 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-07-13 10:21:37 +00:00
youkaichao
41708e5034
[ci] try to add multi-node tests ( #6280 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
Simon Mo
6bc9710f6e
Fix release pipeline's dir permission ( #6391 )
2024-07-12 15:52:43 -07:00
Simon Mo
21b2dcedab
Fix release pipeline's -e flag ( #6390 )
2024-07-12 14:08:04 -07:00
Simon Mo
07b35af86d
Fix interpolation in release pipeline ( #6389 )
2024-07-12 14:03:39 -07:00
Simon Mo
bb1a784b05
Fix release-pipeline.yaml ( #6388 )
2024-07-12 14:00:57 -07:00
Simon Mo
d719ba24c5
Build some nightly wheels by default ( #6380 )
2024-07-12 13:56:59 -07:00
Kevin H. Luu
b75bce1008
[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline ( #6365 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-12 09:58:38 -07:00
Alexei-V-Ivanov-AMD
f9d25c2519
[Build/CI] Checking/Waiting for the GPU's clean state ( #6379 )
2024-07-12 09:42:24 -07:00
Robert Shaw
aea19f0989
[ Misc ] Support Models With Bias in compressed-tensors integration ( #6356 )
2024-07-12 11:11:29 -04:00
adityagoel14
d26a8b3f1f
[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub ( #6350 )
2024-07-11 21:26:26 -07:00
Lily Liu
d6ab528997
[Misc] Remove flashinfer warning, add flashinfer tests to CI ( #6351 )
2024-07-12 01:32:06 +00:00
Robert Shaw
7ed6a4f0e1
[ BugFix ] Prompt Logprobs Detokenization ( #6223 )
...
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
2024-07-11 22:02:29 +00:00
Kuntai Du
a4feba929b
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy ( #5362 )
2024-07-11 13:28:38 -07:00
Simon Mo
52b7fcb35a
Benchmark: add H100 suite ( #6047 )
2024-07-11 09:17:07 -07:00
Kevin H. Luu
a0550cbc80
Add support for multi-node on CI ( #5955 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-09 12:56:56 -07:00
Robert Shaw
abfe705a02
[ Misc ] Support Fp8 via llm-compressor ( #6110 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-07-07 20:42:11 +00:00
Simon Mo
bc96d5c330
Move release wheel env var to Dockerfile instead ( #6163 )
2024-07-05 17:19:53 -07:00
Simon Mo
f0250620dd
Fix release wheel build env var ( #6162 )
2024-07-05 16:24:31 -07:00
Simon Mo
2de490d60f
Update wheel builds to strip debug ( #6161 )
2024-07-05 14:51:25 -07:00
Lily Liu
69ec3ca14c
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer ( #6051 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-04 16:35:51 -07:00
Yuan
81d7a50f24
[Hardware][Intel CPU] Adding intel openmp tunings in Docker file ( #6008 )
...
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
2024-07-04 15:22:12 -07:00
youkaichao
3de6e6a30e
[core][distributed] support n layers % pp size != 0 ( #6115 )
2024-07-03 16:40:31 -07:00
Mor Zusman
9d6a8daa87
[Model] Jamba support ( #4115 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 23:11:29 +00:00
Robert Shaw
7c008c51a9
[ Misc ] Refactor MoE to isolate Fp8 From Mixtral ( #5970 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-02 21:54:35 +00:00
Murali Andoorveedu
c5832d2ae9
[Core] Pipeline Parallel Support ( #4412 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 10:58:08 -07:00
xwjiang2010
98d6682cd1
[VLM] Remove image_input_type from VLM config ( #5852 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
Robert Shaw
d76084c12f
[ CI ] Re-enable Large Model LM Eval ( #6031 )
2024-07-01 12:40:45 -04:00
Robert Shaw
deacb7ec44
[ CI ] Temporarily Disable Large LM-Eval Tests ( #6005 )
...
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>
2024-06-30 11:56:56 -07:00
SangBin Cho
f5e73c9f1b
[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. ( #5909 )
...
Co-authored-by: sang <sangcho@anyscale.com>
2024-06-30 17:11:15 +00:00
youkaichao
2be6955a3f
[ci][distributed] fix device count call
...
[ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991 )
2024-06-30 08:06:13 +00:00
Cyrus Leung
9d47f64eb6
[CI/Build] [3/3] Reorganize entrypoints tests ( #5966 )
2024-06-30 12:58:49 +08:00
Roger Wang
bcc6a09b63
[CI/Build] Temporarily Remove Phi3-Vision from TP Test ( #5989 )
2024-06-30 09:18:31 +08:00
Robert Shaw
75aa1442db
[ CI/Build ] LM Eval Harness Based CI Testing ( #5838 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-29 13:04:30 -04:00
Cyrus Leung
99397da534
[CI/Build] Add TP test for vision models ( #5892 )
2024-06-29 15:45:54 +00:00
Lily Liu
7041de4384
[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode ( #4628 )
...
Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>
2024-06-28 15:28:49 -07:00
Ilya Lavrenov
57f09a419c
[Hardware][Intel] OpenVINO vLLM backend ( #5379 )
2024-06-28 13:50:16 +00:00
xwjiang2010
d12af207d2
[VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted properly ( #5880 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2024-06-27 15:15:24 +08:00
Woo-Yeon Lee
2ce5d6688b
[Speculative Decoding] Support draft model on different tensor-parallel size than target model ( #5414 )
2024-06-25 09:56:06 +00:00
Kevin H. Luu
e9de9dd551
[ci] Remove aws template ( #5757 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-24 21:09:02 -07:00
Kunshang Ji
cf90ae0123
[CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline ( #5616 )
2024-06-21 17:09:34 -07:00
youkaichao
7187507301
[ci][test] fix ca test in main ( #5746 )
2024-06-21 14:04:26 -07:00
youkaichao
d9a252bc8e
[Core][Distributed] add shm broadcast ( #5399 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-06-21 05:12:35 +00:00
youkaichao
6c5b7af152
[distributed][misc] use fork by default for mp ( #5669 )
2024-06-20 17:06:34 -07:00
Kevin H. Luu
949e49a685
[ci] Limit num gpus if specified for A100 ( #5694 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-19 16:30:03 -07:00
youkaichao
d571ca0108
[ci][distributed] add tests for custom allreduce ( #5689 )
2024-06-19 20:16:04 +00:00
Kevin H. Luu
3ee5c4bca5
[ci] Add A100 queue into AWS CI template ( #5648 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-19 08:42:13 -06:00
Kevin H. Luu
19091efc44
[ci] Setup Release pipeline and build release wheels with cache ( #5610 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-18 11:00:36 -07:00
Ronen Schaffer
7879f24dcc
[Misc] Add OpenTelemetry support ( #4687 )
...
This PR adds basic support for OpenTelemetry distributed tracing.
It includes changes to enable tracing functionality and improve monitoring capabilities.
I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
2024-06-19 01:17:03 +09:00
Kevin H. Luu
13db4369d9
[ci] Deprecate original CI template ( #5624 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-18 14:26:20 +00:00
Roger Wang
4ad7b53e59
[CI/Build][Misc] Update Pytest Marker for VLMs ( #5623 )
2024-06-18 13:10:04 +00:00
Kuntai Du
114d7270ff
[CI] Avoid naming different metrics with the same name in performance benchmark ( #5615 )
2024-06-17 21:37:18 -07:00
Cyrus Leung
32c86e494a
[Misc] Fix typo ( #5618 )
2024-06-17 20:58:30 -07:00
Kuntai Du
9e4e6fe207
[CI] the readability of benchmarking and prepare for dashboard ( #5571 )
...
[CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard (#5571 )
2024-06-17 11:41:08 -07:00
Jie Fu (傅杰)
ab66536dbf
[CI/BUILD] Support non-AVX512 vLLM building and testing ( #5574 )
2024-06-17 14:36:10 -04:00
Kunshang Ji
728c4c8a06
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend ( #3814 )
...
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
Antoni Baum
f31c1f90e3
Add basic correctness 2 GPU tests to 4 GPU pipeline ( #5518 )
2024-06-16 07:48:02 +00:00
Simon Mo
bd7efe95d0
Add ccache to amd ( #5555 )
2024-06-14 17:18:22 -07:00
Cyrus Leung
d47af2bc02
[CI/Build] Disable LLaVA-NeXT CPU test ( #5529 )
2024-06-14 09:27:30 -07:00
Kuntai Du
319ad7f1d3
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with perf-benchmarks label ( #5073 )
...
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-13 22:36:20 -07:00
Antoni Baum
50eed24d25
Add cuda_device_count_stateless ( #5473 )
2024-06-13 16:06:49 -07:00
Kevin H. Luu
916d219d62
[ci] Use sccache to build images ( #5419 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-12 17:58:12 -07:00
Kevin H. Luu
8b82a89997
[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests ( #5464 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-12 14:00:18 -07:00
youkaichao
c4bd03c7c5
[Core][Distributed] add same-node detection ( #5369 )
2024-06-11 10:53:59 -07:00
Kevin H. Luu
76477a93b7
[ci] Fix Buildkite agent path ( #5392 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-10 18:58:07 -07:00
Kevin H. Luu
c5602f0baa
[ci] Mount buildkite agent on Docker container to upload benchmark results ( #5330 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-10 09:22:34 -07:00
Kevin H. Luu
f7f9c5f97b
[ci] Use small_cpu_queue for doc build ( #5331 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-10 09:21:11 -07:00