Commit Graph

93 Commits

Author SHA1 Message Date
Simon Mo
40c27a7cbb
[Build] Temporarily Disable Kernels and LoRA tests (#6961) 2024-07-30 14:59:48 -07:00
Roger Wang
ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) 2024-07-27 09:54:14 +00:00
Sanger Steel
969d032265
[Bugfix]: Fix Tensorizer test failures (#6835) 2024-07-26 20:02:25 -07:00
Kevin H. Luu
2eb9f4ff26
[ci] Mark tensorizer as soft fail and separate from grouped test (#6810)
[ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-25 18:08:33 -07:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543) 2024-07-20 09:39:07 -07:00
Nick Hill
b5672a112c
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-18 19:15:52 -07:00
Tyler Michael Smith
1689219ebf
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 (#6517) 2024-07-18 17:29:25 -07:00
youkaichao
f53b8f0d05
[ci][test] add correctness test for cpu offloading (#6549) 2024-07-18 23:41:06 +00:00
Rui Qiao
61e592747c
[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2024-07-17 22:27:09 -07:00
youkaichao
1c27d25fb5
[core][model] yet another cpu offload implementation (#6496)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-17 20:54:35 -07:00
youkaichao
09c2eb85dd
[ci][distributed] add pipeline parallel correctness test (#6410) 2024-07-16 15:44:22 -07:00
Cyrus Leung
d97011512e
[CI/Build] vLLM cache directory for images (#6444) 2024-07-15 23:12:25 -07:00
youkaichao
69672f116c
[core][distributed] simplify code to support pipeline parallel (#6406) 2024-07-14 21:20:51 -07:00
youkaichao
ccd3c04571
[ci][build] fix commit id (#6420)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-07-14 22:16:21 +08:00
youkaichao
41708e5034
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
Kevin H. Luu
b75bce1008
[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-12 09:58:38 -07:00
Lily Liu
d6ab528997
[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351) 2024-07-12 01:32:06 +00:00
Robert Shaw
7ed6a4f0e1
[ BugFix ] Prompt Logprobs Detokenization (#6223)
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
2024-07-11 22:02:29 +00:00
Lily Liu
69ec3ca14c
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-04 16:35:51 -07:00
youkaichao
3de6e6a30e
[core][distributed] support n layers % pp size != 0 (#6115) 2024-07-03 16:40:31 -07:00
Murali Andoorveedu
c5832d2ae9
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 10:58:08 -07:00
Robert Shaw
d76084c12f
[ CI ] Re-enable Large Model LM Eval (#6031) 2024-07-01 12:40:45 -04:00
Robert Shaw
deacb7ec44
[ CI ] Temporarily Disable Large LM-Eval Tests (#6005)
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>
2024-06-30 11:56:56 -07:00
SangBin Cho
f5e73c9f1b
[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. (#5909)
Co-authored-by: sang <sangcho@anyscale.com>
2024-06-30 17:11:15 +00:00
youkaichao
2be6955a3f
[ci][distributed] fix device count call
[ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991)
2024-06-30 08:06:13 +00:00
Cyrus Leung
9d47f64eb6
[CI/Build] [3/3] Reorganize entrypoints tests (#5966) 2024-06-30 12:58:49 +08:00
Roger Wang
bcc6a09b63
[CI/Build] Temporarily Remove Phi3-Vision from TP Test (#5989) 2024-06-30 09:18:31 +08:00
Robert Shaw
75aa1442db
[ CI/Build ] LM Eval Harness Based CI Testing (#5838)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-29 13:04:30 -04:00
Cyrus Leung
99397da534
[CI/Build] Add TP test for vision models (#5892) 2024-06-29 15:45:54 +00:00
Lily Liu
7041de4384
[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628)
Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>
2024-06-28 15:28:49 -07:00
xwjiang2010
d12af207d2
[VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted properly (#5880)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2024-06-27 15:15:24 +08:00
Woo-Yeon Lee
2ce5d6688b
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414) 2024-06-25 09:56:06 +00:00
Kevin H. Luu
e9de9dd551
[ci] Remove aws template (#5757)
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-24 21:09:02 -07:00
youkaichao
7187507301
[ci][test] fix ca test in main (#5746) 2024-06-21 14:04:26 -07:00
youkaichao
d9a252bc8e
[Core][Distributed] add shm broadcast (#5399)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-06-21 05:12:35 +00:00
youkaichao
6c5b7af152
[distributed][misc] use fork by default for mp (#5669) 2024-06-20 17:06:34 -07:00
Kevin H. Luu
949e49a685
[ci] Limit num gpus if specified for A100 (#5694)
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-19 16:30:03 -07:00
youkaichao
d571ca0108
[ci][distributed] add tests for custom allreduce (#5689) 2024-06-19 20:16:04 +00:00
Kevin H. Luu
3ee5c4bca5
[ci] Add A100 queue into AWS CI template (#5648)
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-19 08:42:13 -06:00
Ronen Schaffer
7879f24dcc
[Misc] Add OpenTelemetry support (#4687)
This PR adds basic support for OpenTelemetry distributed tracing.
It includes changes to enable tracing functionality and improve monitoring capabilities.

I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
2024-06-19 01:17:03 +09:00
Kevin H. Luu
13db4369d9
[ci] Deprecate original CI template (#5624)
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-18 14:26:20 +00:00
Roger Wang
4ad7b53e59
[CI/Build][Misc] Update Pytest Marker for VLMs (#5623) 2024-06-18 13:10:04 +00:00
Antoni Baum
f31c1f90e3
Add basic correctness 2 GPU tests to 4 GPU pipeline (#5518) 2024-06-16 07:48:02 +00:00
Antoni Baum
50eed24d25
Add cuda_device_count_stateless (#5473) 2024-06-13 16:06:49 -07:00
youkaichao
c4bd03c7c5
[Core][Distributed] add same-node detection (#5369) 2024-06-11 10:53:59 -07:00
Antoni Baum
ccdc490dda
[Core] Change LoRA embedding sharding to support loading methods (#5038) 2024-06-06 19:07:57 -07:00
Cyrus Leung
89c920785f
[CI/Build] Update vision tests (#5307) 2024-06-06 05:17:18 -05:00
Simon Mo
3a6ae1d33c
[CI] Disable flash_attn backend for spec decode (#5286) 2024-06-05 15:49:27 -07:00
Cyrus Leung
ec784b2526
[CI/Build] Add inputs tests (#5215) 2024-06-03 21:01:46 -07:00
youkaichao
5bd3c65072
[Core][Optimization] remove vllm-nccl (#5091) 2024-05-29 05:13:52 +00:00