Commit Graph

100 Commits

Author SHA1 Message Date
Hanzhi Zhou
6192e9b8fe
[Core][Distributed] Refactor ipc buffer init in CustomAllreduce (#10030)
Signed-off-by: Hanzhi Zhou <hanzhi713@gmail.com>
2024-11-06 23:50:47 -08:00
youkaichao
719c1ca468
[core][distributed] add stateless_init_process_group (#10072)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-06 16:42:09 -08:00
youkaichao
4be3a45158
[distributed] add function to create ipc buffers directly (#10064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-05 22:35:03 -08:00
Yongzao
2b5bf20988
[torch.compile] Adding torch compile annotations to some models (#9876)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-01 00:25:47 -07:00
Yongzao
d27cfbf791
[torch.compile] Adding torch compile annotations to some models (#9641)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 09:31:42 -07:00
Yongzao
8a02cd045a
[torch.compile] Adding torch compile annotations to some models (#9639)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 00:54:57 -07:00
Cyrus Leung
836e8ef6ee
[Bugfix] Fix PP for ChatGLM and Molmo (#9422) 2024-10-24 06:12:05 +00:00
Yunfei Chu
fc6c274626
[Model] Add Qwen2-Audio model support (#9248)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-23 17:54:22 +00:00
Travis Johnson
b729901139
[Bugfix]: serialize config by value for --trust-remote-code (#6751)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-21 19:46:24 -07:00
Cyrus Leung
051eaf6db3
[Model] Add user-configurable task for models that support both generation and embedding (#9424) 2024-10-18 11:31:58 -07:00
Cyrus Leung
b22b798471
[Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
Murali Andoorveedu
0f6d7a9a34
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
Isotr0py
bc4eb65b54
[Bugfix] Fix Fuyu tensor parallel inference (#8986) 2024-10-01 17:51:41 +08:00
Nick Hill
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
Yanyi Liu
a79e522984
[Model] Support pp for qwen2-vl (#8696) 2024-09-23 13:46:59 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
Isotr0py
9b4a3b235e
[CI/Build] Enable InternVL2 PP test only on single node (#8437) 2024-09-13 06:35:20 +00:00
Roger Wang
b61bd98f90
[CI/Build] Disable multi-node test for InternVL2 (#8428) 2024-09-12 15:05:35 -07:00
Isotr0py
1230263e16
[Bugfix] Fix InternVL2 vision embeddings process with pipeline parallel (#8299) 2024-09-11 10:11:01 +08:00
Cyrus Leung
2f707fcb35
[Model] Multi-input support for LLaVA (#8238) 2024-09-07 02:57:24 +00:00
manikandan.tm@zucisystems.com
8685ba1a1e
Inclusion of InternVLChatModel In PP_SUPPORTED_MODELS(Pipeline Parallelism) (#7860) 2024-09-05 11:33:37 +00:00
SangBin Cho
c01a6cb231
[Ray backend] Better error when pg topology is bad. (#7584)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-08-22 17:44:25 -07:00
Peter Salas
1ca0d4f86b
[Model] Add UltravoxModel and UltravoxConfig (#7615) 2024-08-21 22:49:39 +00:00
SangBin Cho
ff7ec82c4d
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
SangBin Cho
4706eb628e
[aDAG] Unflake aDAG + PP tests (#7600) 2024-08-16 20:49:30 -07:00
jon-chuang
50b8d08dbd
[Misc/Testing] Use torch.testing.assert_close (#7324) 2024-08-16 04:24:04 +00:00
youkaichao
4cd7d47fed
[ci/test] rearrange tests and make adag test soft fail (#7572) 2024-08-15 19:39:04 -07:00
youkaichao
16422ea76f
[misc][plugin] add plugin system implementation (#7426) 2024-08-13 16:24:17 -07:00
Cyrus Leung
7025b11d94
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410) 2024-08-13 05:33:41 +00:00
Rui Qiao
198d6a2898
[Core] Shut down aDAG workers with clean async llm engine exit (#7224)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-12 17:57:16 -07:00
Cyrus Leung
7eb4a51c5f
[Core] Support serving encoder/decoder models (#7258) 2024-08-09 10:39:41 +08:00
afeldman-nm
fd95e026e0
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-06 16:51:47 -04:00
youkaichao
a0d164567c
[ci][distributed] disable ray dag tests (#7099) 2024-08-02 22:32:04 -07:00
youkaichao
04e5583425
[ci][distributed] merge distributed test commands (#7097)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-02 21:33:53 -07:00
Rui Qiao
05308891e2
[Core] Pipeline parallel with Ray ADAG (#6837)
Support pipeline-parallelism with Ray accelerated DAG.

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 13:55:40 -07:00
youkaichao
252357793d
[ci][distributed] try to fix pp test (#7054) 2024-08-01 22:03:12 -07:00
Cody Yu
bd70013407
[MISC] Introduce pipeline parallelism partition strategies (#6920)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-07-31 12:02:17 -07:00
Cyrus Leung
f230cc2ca6
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836) 2024-07-31 10:38:45 +08:00
youkaichao
443c7cf4cf
[ci][distributed] fix flaky tests (#6806) 2024-07-25 17:44:09 -07:00
William Lin
5e8ca973eb
[Bugfix] fix flashinfer cudagraph capture for PP (#6708) 2024-07-24 01:49:44 +00:00
Nick Hill
b5672a112c
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-18 19:15:52 -07:00
youkaichao
f53b8f0d05
[ci][test] add correctness test for cpu offloading (#6549) 2024-07-18 23:41:06 +00:00
Cody Yu
b5af8c223c
[Model] Pipeline parallel support for Mixtral (#6516) 2024-07-17 19:26:04 -07:00
Murali Andoorveedu
5fa6e9876e
[Bugfix] Fix for multinode crash on 4 PP (#6495)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-17 08:25:10 +00:00
Cyrus Leung
5bf35a91e4
[Doc][CI/Build] Update docs and tests to use vllm serve (#6431) 2024-07-17 07:43:21 +00:00
youkaichao
7f62077af5
[misc][distributed] improve tests (#6488) 2024-07-16 17:35:52 -07:00
youkaichao
09c2eb85dd
[ci][distributed] add pipeline parallel correctness test (#6410) 2024-07-16 15:44:22 -07:00
youkaichao
41708e5034
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
Hongxia Yang
b6c16cf8ff
[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352) 2024-07-11 21:30:46 -07:00
youkaichao
da78caecfa
[core][distributed] zmq fallback for broadcasting large objects (#6183)
[core][distributed] add zmq fallback for broadcasting large objects (#6183)
2024-07-09 18:49:11 -07:00