SangBin Cho
|
c01a6cb231
|
[Ray backend] Better error when pg topology is bad. (#7584)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-08-22 17:44:25 -07:00 |
|
Peter Salas
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
SangBin Cho
|
ff7ec82c4d
|
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109)
|
2024-08-18 17:57:20 -07:00 |
|
SangBin Cho
|
4706eb628e
|
[aDAG] Unflake aDAG + PP tests (#7600)
|
2024-08-16 20:49:30 -07:00 |
|
jon-chuang
|
50b8d08dbd
|
[Misc/Testing] Use torch.testing.assert_close (#7324)
|
2024-08-16 04:24:04 +00:00 |
|
youkaichao
|
4cd7d47fed
|
[ci/test] rearrange tests and make adag test soft fail (#7572)
|
2024-08-15 19:39:04 -07:00 |
|
youkaichao
|
16422ea76f
|
[misc][plugin] add plugin system implementation (#7426)
|
2024-08-13 16:24:17 -07:00 |
|
Cyrus Leung
|
7025b11d94
|
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410)
|
2024-08-13 05:33:41 +00:00 |
|
Rui Qiao
|
198d6a2898
|
[Core] Shut down aDAG workers with clean async llm engine exit (#7224)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-12 17:57:16 -07:00 |
|
Cyrus Leung
|
7eb4a51c5f
|
[Core] Support serving encoder/decoder models (#7258)
|
2024-08-09 10:39:41 +08:00 |
|
afeldman-nm
|
fd95e026e0
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-06 16:51:47 -04:00 |
|
youkaichao
|
a0d164567c
|
[ci][distributed] disable ray dag tests (#7099)
|
2024-08-02 22:32:04 -07:00 |
|
youkaichao
|
04e5583425
|
[ci][distributed] merge distributed test commands (#7097)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-02 21:33:53 -07:00 |
|
Rui Qiao
|
05308891e2
|
[Core] Pipeline parallel with Ray ADAG (#6837)
Support pipeline-parallelism with Ray accelerated DAG.
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-02 13:55:40 -07:00 |
|
youkaichao
|
252357793d
|
[ci][distributed] try to fix pp test (#7054)
|
2024-08-01 22:03:12 -07:00 |
|
Cody Yu
|
bd70013407
|
[MISC] Introduce pipeline parallelism partition strategies (#6920)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-07-31 12:02:17 -07:00 |
|
Cyrus Leung
|
f230cc2ca6
|
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836)
|
2024-07-31 10:38:45 +08:00 |
|
youkaichao
|
443c7cf4cf
|
[ci][distributed] fix flaky tests (#6806)
|
2024-07-25 17:44:09 -07:00 |
|
William Lin
|
5e8ca973eb
|
[Bugfix] fix flashinfer cudagraph capture for PP (#6708)
|
2024-07-24 01:49:44 +00:00 |
|
Nick Hill
|
b5672a112c
|
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-18 19:15:52 -07:00 |
|
youkaichao
|
f53b8f0d05
|
[ci][test] add correctness test for cpu offloading (#6549)
|
2024-07-18 23:41:06 +00:00 |
|
Cody Yu
|
b5af8c223c
|
[Model] Pipeline parallel support for Mixtral (#6516)
|
2024-07-17 19:26:04 -07:00 |
|
Murali Andoorveedu
|
5fa6e9876e
|
[Bugfix] Fix for multinode crash on 4 PP (#6495)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-17 08:25:10 +00:00 |
|
Cyrus Leung
|
5bf35a91e4
|
[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)
|
2024-07-17 07:43:21 +00:00 |
|
youkaichao
|
7f62077af5
|
[misc][distributed] improve tests (#6488)
|
2024-07-16 17:35:52 -07:00 |
|
youkaichao
|
09c2eb85dd
|
[ci][distributed] add pipeline parallel correctness test (#6410)
|
2024-07-16 15:44:22 -07:00 |
|
youkaichao
|
41708e5034
|
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-12 21:51:48 -07:00 |
|
Hongxia Yang
|
b6c16cf8ff
|
[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352)
|
2024-07-11 21:30:46 -07:00 |
|
youkaichao
|
da78caecfa
|
[core][distributed] zmq fallback for broadcasting large objects (#6183)
[core][distributed] add zmq fallback for broadcasting large objects (#6183)
|
2024-07-09 18:49:11 -07:00 |
|
xwjiang2010
|
d9e98f42e4
|
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-03 22:14:16 +00:00 |
|
Cyrus Leung
|
9831aec49f
|
[Core] Dynamic image size support for VLMs (#5276)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-07-02 20:34:00 -07:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|
youkaichao
|
2be6955a3f
|
[ci][distributed] fix device count call
[ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991)
|
2024-06-30 08:06:13 +00:00 |
|
Cyrus Leung
|
cff6a1fec1
|
[CI/Build] Reuse code for checking output consistency (#5988)
|
2024-06-30 11:44:25 +08:00 |
|
Cyrus Leung
|
99397da534
|
[CI/Build] Add TP test for vision models (#5892)
|
2024-06-29 15:45:54 +00:00 |
|
Lily Liu
|
7041de4384
|
[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628)
Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>
|
2024-06-28 15:28:49 -07:00 |
|
xwjiang2010
|
b90d8cd832
|
[Distributed] Make it clear that % should not be in tensor dict keys. (#5927)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
|
2024-06-28 15:20:22 +00:00 |
|
xwjiang2010
|
d12af207d2
|
[VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted properly (#5880)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
|
2024-06-27 15:15:24 +08:00 |
|
youkaichao
|
515080ad2f
|
[bugfix][distributed] fix shm broadcast when the queue size is full (#5801)
|
2024-06-25 21:56:02 -07:00 |
|
Matt Wong
|
dd793d1de5
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
Murali Andoorveedu
|
5d4d90536f
|
[Distributed] Add send and recv helpers (#5719)
|
2024-06-23 14:42:28 -07:00 |
|
youkaichao
|
d9a252bc8e
|
[Core][Distributed] add shm broadcast (#5399)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-06-21 05:12:35 +00:00 |
|
youkaichao
|
d571ca0108
|
[ci][distributed] add tests for custom allreduce (#5689)
|
2024-06-19 20:16:04 +00:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
youkaichao
|
48f589e18b
|
[mis] fix flaky test of test_cuda_device_count_stateless (#5546)
|
2024-06-14 10:02:23 -07:00 |
|
Antoni Baum
|
50eed24d25
|
Add cuda_device_count_stateless (#5473)
|
2024-06-13 16:06:49 -07:00 |
|
youkaichao
|
ea3890a5f0
|
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293)
[Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293)
|
2024-06-12 17:27:08 -07:00 |
|
youkaichao
|
c4bd03c7c5
|
[Core][Distributed] add same-node detection (#5369)
|
2024-06-11 10:53:59 -07:00 |
|
youkaichao
|
8ea5e44a43
|
[CI/Test] improve robustness of test (vllm_runner) (#5357)
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)
|
2024-06-08 08:59:20 +00:00 |
|
youkaichao
|
9fb900f90c
|
[CI/Test] improve robustness of test (hf_runner) (#5347)
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner) (#5347)
|
2024-06-07 22:31:32 -07:00 |
|