Simon Mo
40c27a7cbb
[Build] Temporarily Disable Kernels and LoRA tests ( #6961 )
2024-07-30 14:59:48 -07:00
Roger Wang
ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes ( #6860 )
2024-07-27 09:54:14 +00:00
Sanger Steel
969d032265
[Bugfix]: Fix Tensorizer test failures ( #6835 )
2024-07-26 20:02:25 -07:00
Kevin H. Luu
2eb9f4ff26
[ci] Mark tensorizer as soft fail and separate from grouped test ( #6810 )
...
[ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810 )
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-25 18:08:33 -07:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes ( #6543 )
2024-07-20 09:39:07 -07:00
Nick Hill
b5672a112c
[Core] Multiprocessing Pipeline Parallel support ( #6130 )
...
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-18 19:15:52 -07:00
Tyler Michael Smith
1689219ebf
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 ( #6517 )
2024-07-18 17:29:25 -07:00
youkaichao
f53b8f0d05
[ci][test] add correctness test for cpu offloading ( #6549 )
2024-07-18 23:41:06 +00:00
Rui Qiao
61e592747c
[Core] Introduce SPMD worker execution using Ray accelerated DAG ( #6032 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2024-07-17 22:27:09 -07:00
youkaichao
1c27d25fb5
[core][model] yet another cpu offload implementation ( #6496 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-17 20:54:35 -07:00
youkaichao
09c2eb85dd
[ci][distributed] add pipeline parallel correctness test ( #6410 )
2024-07-16 15:44:22 -07:00
Cyrus Leung
d97011512e
[CI/Build] vLLM cache directory for images ( #6444 )
2024-07-15 23:12:25 -07:00
youkaichao
69672f116c
[core][distributed] simplify code to support pipeline parallel ( #6406 )
2024-07-14 21:20:51 -07:00
youkaichao
ccd3c04571
[ci][build] fix commit id ( #6420 )
...
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-07-14 22:16:21 +08:00
youkaichao
41708e5034
[ci] try to add multi-node tests ( #6280 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
Kevin H. Luu
b75bce1008
[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline ( #6365 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-12 09:58:38 -07:00
Lily Liu
d6ab528997
[Misc] Remove flashinfer warning, add flashinfer tests to CI ( #6351 )
2024-07-12 01:32:06 +00:00
Robert Shaw
7ed6a4f0e1
[ BugFix ] Prompt Logprobs Detokenization ( #6223 )
...
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
2024-07-11 22:02:29 +00:00
Lily Liu
69ec3ca14c
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer ( #6051 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-04 16:35:51 -07:00
youkaichao
3de6e6a30e
[core][distributed] support n layers % pp size != 0 ( #6115 )
2024-07-03 16:40:31 -07:00
Murali Andoorveedu
c5832d2ae9
[Core] Pipeline Parallel Support ( #4412 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 10:58:08 -07:00
Robert Shaw
d76084c12f
[ CI ] Re-enable Large Model LM Eval ( #6031 )
2024-07-01 12:40:45 -04:00
Robert Shaw
deacb7ec44
[ CI ] Temporarily Disable Large LM-Eval Tests ( #6005 )
...
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>
2024-06-30 11:56:56 -07:00
SangBin Cho
f5e73c9f1b
[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. ( #5909 )
...
Co-authored-by: sang <sangcho@anyscale.com>
2024-06-30 17:11:15 +00:00
youkaichao
2be6955a3f
[ci][distributed] fix device count call
...
[ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991 )
2024-06-30 08:06:13 +00:00
Cyrus Leung
9d47f64eb6
[CI/Build] [3/3] Reorganize entrypoints tests ( #5966 )
2024-06-30 12:58:49 +08:00
Roger Wang
bcc6a09b63
[CI/Build] Temporarily Remove Phi3-Vision from TP Test ( #5989 )
2024-06-30 09:18:31 +08:00
Robert Shaw
75aa1442db
[ CI/Build ] LM Eval Harness Based CI Testing ( #5838 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-29 13:04:30 -04:00
Cyrus Leung
99397da534
[CI/Build] Add TP test for vision models ( #5892 )
2024-06-29 15:45:54 +00:00
Lily Liu
7041de4384
[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode ( #4628 )
...
Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>
2024-06-28 15:28:49 -07:00
xwjiang2010
d12af207d2
[VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted properly ( #5880 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2024-06-27 15:15:24 +08:00
Woo-Yeon Lee
2ce5d6688b
[Speculative Decoding] Support draft model on different tensor-parallel size than target model ( #5414 )
2024-06-25 09:56:06 +00:00
Kevin H. Luu
e9de9dd551
[ci] Remove aws template ( #5757 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-24 21:09:02 -07:00
youkaichao
7187507301
[ci][test] fix ca test in main ( #5746 )
2024-06-21 14:04:26 -07:00
youkaichao
d9a252bc8e
[Core][Distributed] add shm broadcast ( #5399 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-06-21 05:12:35 +00:00
youkaichao
6c5b7af152
[distributed][misc] use fork by default for mp ( #5669 )
2024-06-20 17:06:34 -07:00
Kevin H. Luu
949e49a685
[ci] Limit num gpus if specified for A100 ( #5694 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-19 16:30:03 -07:00
youkaichao
d571ca0108
[ci][distributed] add tests for custom allreduce ( #5689 )
2024-06-19 20:16:04 +00:00
Kevin H. Luu
3ee5c4bca5
[ci] Add A100 queue into AWS CI template ( #5648 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-19 08:42:13 -06:00
Ronen Schaffer
7879f24dcc
[Misc] Add OpenTelemetry support ( #4687 )
...
This PR adds basic support for OpenTelemetry distributed tracing.
It includes changes to enable tracing functionality and improve monitoring capabilities.
I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
2024-06-19 01:17:03 +09:00
Kevin H. Luu
13db4369d9
[ci] Deprecate original CI template ( #5624 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-18 14:26:20 +00:00
Roger Wang
4ad7b53e59
[CI/Build][Misc] Update Pytest Marker for VLMs ( #5623 )
2024-06-18 13:10:04 +00:00
Antoni Baum
f31c1f90e3
Add basic correctness 2 GPU tests to 4 GPU pipeline ( #5518 )
2024-06-16 07:48:02 +00:00
Antoni Baum
50eed24d25
Add cuda_device_count_stateless ( #5473 )
2024-06-13 16:06:49 -07:00
youkaichao
c4bd03c7c5
[Core][Distributed] add same-node detection ( #5369 )
2024-06-11 10:53:59 -07:00
Antoni Baum
ccdc490dda
[Core] Change LoRA embedding sharding to support loading methods ( #5038 )
2024-06-06 19:07:57 -07:00
Cyrus Leung
89c920785f
[CI/Build] Update vision tests ( #5307 )
2024-06-06 05:17:18 -05:00
Simon Mo
3a6ae1d33c
[CI] Disable flash_attn backend for spec decode ( #5286 )
2024-06-05 15:49:27 -07:00
Cyrus Leung
ec784b2526
[CI/Build] Add inputs tests ( #5215 )
2024-06-03 21:01:46 -07:00
youkaichao
5bd3c65072
[Core][Optimization] remove vllm-nccl ( #5091 )
2024-05-29 05:13:52 +00:00