youkaichao
|
639159b2a6
|
[distributed][misc] add specialized method for cuda platform (#7249)
|
2024-08-07 08:54:52 -07:00 |
|
Rui Qiao
|
997cf78308
|
[Misc] Fix typo in GroupCoordinator.recv() (#7167)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-05 11:10:16 -07:00 |
|
youkaichao
|
16a1cc9bb2
|
[misc][distributed] improve libcudart.so finding (#7127)
|
2024-08-04 11:31:51 -07:00 |
|
Aurick Qiao
|
0437492ea9
|
PP comm optimization: replace send with partial send + allgather (#6695)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2024-07-31 20:15:42 -07:00 |
|
Cody Yu
|
bd70013407
|
[MISC] Introduce pipeline parallelism partition strategies (#6920)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-07-31 12:02:17 -07:00 |
|
Cyrus Leung
|
f230cc2ca6
|
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836)
|
2024-07-31 10:38:45 +08:00 |
|
Woosuk Kwon
|
fad5576c58
|
[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856)
|
2024-07-27 10:28:33 -07:00 |
|
Woosuk Kwon
|
d09b94ca58
|
[TPU] Support collective communications in XLA devices (#6813)
|
2024-07-27 01:45:57 +00:00 |
|
Li, Jiang
|
3bbb4936dc
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
Tyler Michael Smith
|
95db75de64
|
[Bugfix] Add synchronize to prevent possible data race (#6788)
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-07-25 10:40:01 -07:00 |
|
youkaichao
|
740374d456
|
[core][distributed] fix zmq hang (#6759)
|
2024-07-24 17:37:12 -07:00 |
|
Woosuk Kwon
|
b6df37f943
|
[Misc] Remove abused noqa (#6619)
|
2024-07-21 23:47:04 +08:00 |
|
youkaichao
|
07eb6f19f3
|
[bugfix][distributed] fix multi-node bug for shared memory (#6597)
|
2024-07-19 15:34:34 -07:00 |
|
Nick Hill
|
d25877dd9b
|
[BugFix] Avoid secondary error in ShmRingBuffer destructor (#6530)
|
2024-07-17 22:24:43 -07:00 |
|
youkaichao
|
7f62077af5
|
[misc][distributed] improve tests (#6488)
|
2024-07-16 17:35:52 -07:00 |
|
Cyrus Leung
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
youkaichao
|
2b0fb53481
|
[distributed][misc] be consistent with pytorch for libcudart.so (#6346)
[distributed][misc] keep consistent with how pytorch finds libcudart.so (#6346)
|
2024-07-11 19:35:17 -07:00 |
|
youkaichao
|
da78caecfa
|
[core][distributed] zmq fallback for broadcasting large objects (#6183)
[core][distributed] add zmq fallback for broadcasting large objects (#6183)
|
2024-07-09 18:49:11 -07:00 |
|
youkaichao
|
3de6e6a30e
|
[core][distributed] support n layers % pp size != 0 (#6115)
|
2024-07-03 16:40:31 -07:00 |
|
youkaichao
|
3c6325f0fc
|
[core][distributed] custom allreduce when pp size > 1 (#6117)
|
2024-07-03 14:41:32 -07:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|
youkaichao
|
614aa51203
|
[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007)
|
2024-06-30 20:07:34 -07:00 |
|
Cyrus Leung
|
99397da534
|
[CI/Build] Add TP test for vision models (#5892)
|
2024-06-29 15:45:54 +00:00 |
|
xwjiang2010
|
b90d8cd832
|
[Distributed] Make it clear that % should not be in tensor dict keys. (#5927)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
|
2024-06-28 15:20:22 +00:00 |
|
xwjiang2010
|
74d55c065b
|
[VLM][BugFix] Make sure that multi_modal_kwargs can broadcast properly with ring buffer. (#5905)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-06-28 07:29:13 +00:00 |
|
xwjiang2010
|
d12af207d2
|
[VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted properly (#5880)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
|
2024-06-27 15:15:24 +08:00 |
|
youkaichao
|
515080ad2f
|
[bugfix][distributed] fix shm broadcast when the queue size is full (#5801)
|
2024-06-25 21:56:02 -07:00 |
|
Matt Wong
|
dd793d1de5
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
Woo-Yeon Lee
|
2ce5d6688b
|
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414)
|
2024-06-25 09:56:06 +00:00 |
|
Murali Andoorveedu
|
5d4d90536f
|
[Distributed] Add send and recv helpers (#5719)
|
2024-06-23 14:42:28 -07:00 |
|
youkaichao
|
832ea88fcb
|
[core][distributed] improve shared memory broadcast (#5754)
|
2024-06-22 10:00:43 -07:00 |
|
youkaichao
|
d9a252bc8e
|
[Core][Distributed] add shm broadcast (#5399)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-06-21 05:12:35 +00:00 |
|
youkaichao
|
6c5b7af152
|
[distributed][misc] use fork by default for mp (#5669)
|
2024-06-20 17:06:34 -07:00 |
|
youkaichao
|
db5ec52ad7
|
[bugfix][distributed] improve p2p capability test (#5612)
[bugfix][distributed] do not error if two processes do not agree on p2p capability (#5612)
|
2024-06-18 07:21:05 +00:00 |
|
Kunshang Ji
|
728c4c8a06
|
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
|
2024-06-17 11:01:25 -07:00 |
|
Cyrus Leung
|
0e9164b40a
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
youkaichao
|
f5bb85b435
|
[Core][Distributed] improve p2p cache generation (#5528)
|
2024-06-14 14:47:45 -07:00 |
|
youkaichao
|
d1c3d7d139
|
[misc][distributed] fix benign error in is_in_the_same_node (#5512)
|
2024-06-14 10:59:28 -07:00 |
|
Antoni Baum
|
50eed24d25
|
Add cuda_device_count_stateless (#5473)
|
2024-06-13 16:06:49 -07:00 |
|
youkaichao
|
ea3890a5f0
|
[Core][Distributed] code deduplication in tp&pp with coordinator(#5293)
[Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293)
|
2024-06-12 17:27:08 -07:00 |
|
youkaichao
|
c4bd03c7c5
|
[Core][Distributed] add same-node detection (#5369)
|
2024-06-11 10:53:59 -07:00 |
|
youkaichao
|
c81da5f56d
|
[misc][typo] fix typo (#5372)
|
2024-06-10 09:51:02 +00:00 |
|
bnellnm
|
5467ac3196
|
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047)
|
2024-06-09 16:23:30 -04:00 |
|
youkaichao
|
594392d27a
|
[Core][Distributed] improve p2p access check (#4992)
|
2024-05-29 11:29:07 +00:00 |
|
youkaichao
|
5bd3c65072
|
[Core][Optimization] remove vllm-nccl (#5091)
|
2024-05-29 05:13:52 +00:00 |
|
Murali Andoorveedu
|
5eda2ea02a
|
[Core][1/N] Support send/recv in PyNCCL Groups (#4988)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-05-23 09:54:48 -07:00 |
|
youkaichao
|
e08188081b
|
[Core][Distributed] remove graph mode function (#4818)
|
2024-05-16 10:59:52 -07:00 |
|
Cody Yu
|
973617ae02
|
[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840)
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Cade Daniel <cade@anyscale.com>
|
2024-05-16 00:53:51 -07:00 |
|
youkaichao
|
702bee461f
|
[Core][Distributed] refactor custom allreduce to support multiple tp groups (#4754)
|
2024-05-12 17:47:59 -07:00 |
|
youkaichao
|
4e12131089
|
[Core][Test] fix function name typo in custom allreduce (#4750)
|
2024-05-10 15:14:40 -07:00 |
|