squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
youkaichao	663874e048	[torch.compile] improve allreduce registration (#9061 )	2024-10-04 16:43:50 -07:00
youkaichao	18e60d7d13	[misc][distributed] add VLLM_SKIP_P2P_CHECK flag (#8911 )	2024-09-27 14:27:56 -07:00
Russell Bryant	b05f5c9238	[Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-23 12:15:41 -07:00
Kunshang Ji	d4bf085ad0	[MISC] add support custom_op check (#8557 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-20 19:03:55 -07:00
Russell Bryant	d65798f78c	[Core] zmq: bind only to 127.0.0.1 for local-only usage (#8543 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-18 16:10:27 +00:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
youkaichao	99aa4eddaf	[torch.compile] register allreduce operations as custom ops (#8526 )	2024-09-16 22:57:57 -07:00
Richard Liu	2148441fd3	[TPU] Support single and multi-host TPUs on GKE (#7613 )	2024-08-30 00:27:40 -07:00
youkaichao	05826c887b	[misc] fix custom allreduce p2p cache file generation (#7853 )	2024-08-26 15:02:25 -07:00
youkaichao	d95cc0a55c	[core][misc] update libcudart finding (#7620 ) Co-authored-by: cjackal <44624812+cjackal@users.noreply.github.com>	2024-08-16 23:01:35 -07:00
bnellnm	e680349994	[Bugfix] Fix custom_ar support check (#7617 )	2024-08-16 19:05:49 -07:00
Woosuk Kwon	59edd0f134	[Bugfix][CI] Import ray under guard (#7486 )	2024-08-13 17:12:58 -07:00
Woosuk Kwon	a08df8322e	[TPU] Support multi-host inference (#7457 )	2024-08-13 16:31:20 -07:00
Cyrus Leung	7025b11d94	[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410 )	2024-08-13 05:33:41 +00:00
youkaichao	639159b2a6	[distributed][misc] add specialized method for cuda platform (#7249 )	2024-08-07 08:54:52 -07:00
Rui Qiao	997cf78308	[Misc] Fix typo in GroupCoordinator.recv() (#7167 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-05 11:10:16 -07:00
youkaichao	16a1cc9bb2	[misc][distributed] improve libcudart.so finding (#7127 )	2024-08-04 11:31:51 -07:00
Aurick Qiao	0437492ea9	PP comm optimization: replace send with partial send + allgather (#6695 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2024-07-31 20:15:42 -07:00
Cody Yu	bd70013407	[MISC] Introduce pipeline parallelism partition strategies (#6920 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-07-31 12:02:17 -07:00
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Woosuk Kwon	fad5576c58	[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856 )	2024-07-27 10:28:33 -07:00
Woosuk Kwon	d09b94ca58	[TPU] Support collective communications in XLA devices (#6813 )	2024-07-27 01:45:57 +00:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
Tyler Michael Smith	95db75de64	[Bugfix] Add synchronize to prevent possible data race (#6788 ) Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2024-07-25 10:40:01 -07:00
youkaichao	740374d456	[core][distributed] fix zmq hang (#6759 )	2024-07-24 17:37:12 -07:00
Woosuk Kwon	b6df37f943	[Misc] Remove abused noqa (#6619 )	2024-07-21 23:47:04 +08:00
youkaichao	07eb6f19f3	[bugfix][distributed] fix multi-node bug for shared memory (#6597 )	2024-07-19 15:34:34 -07:00
Nick Hill	d25877dd9b	[BugFix] Avoid secondary error in ShmRingBuffer destructor (#6530 )	2024-07-17 22:24:43 -07:00
youkaichao	7f62077af5	[misc][distributed] improve tests (#6488 )	2024-07-16 17:35:52 -07:00
Cyrus Leung	d97011512e	[CI/Build] vLLM cache directory for images (#6444 )	2024-07-15 23:12:25 -07:00
youkaichao	2b0fb53481	[distributed][misc] be consistent with pytorch for libcudart.so (#6346 ) [distributed][misc] keep consistent with how pytorch finds libcudart.so (#6346)	2024-07-11 19:35:17 -07:00
youkaichao	da78caecfa	[core][distributed] zmq fallback for broadcasting large objects (#6183 ) [core][distributed] add zmq fallback for broadcasting large objects (#6183)	2024-07-09 18:49:11 -07:00
youkaichao	3de6e6a30e	[core][distributed] support n layers % pp size != 0 (#6115 )	2024-07-03 16:40:31 -07:00
youkaichao	3c6325f0fc	[core][distributed] custom allreduce when pp size > 1 (#6117 )	2024-07-03 14:41:32 -07:00
Murali Andoorveedu	c5832d2ae9	[Core] Pipeline Parallel Support (#4412 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 10:58:08 -07:00
youkaichao	614aa51203	[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007 )	2024-06-30 20:07:34 -07:00
Cyrus Leung	99397da534	[CI/Build] Add TP test for vision models (#5892 )	2024-06-29 15:45:54 +00:00
xwjiang2010	b90d8cd832	[Distributed] Make it clear that % should not be in tensor dict keys. (#5927 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-06-28 15:20:22 +00:00
xwjiang2010	74d55c065b	[VLM][BugFix] Make sure that `multi_modal_kwargs` can broadcast properly with ring buffer. (#5905 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-28 07:29:13 +00:00
xwjiang2010	d12af207d2	[VLM][Bugfix] Make sure that `multi_modal_kwargs` is broadcasted properly (#5880 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-06-27 15:15:24 +08:00
youkaichao	515080ad2f	[bugfix][distributed] fix shm broadcast when the queue size is full (#5801 )	2024-06-25 21:56:02 -07:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00
Woo-Yeon Lee	2ce5d6688b	[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414 )	2024-06-25 09:56:06 +00:00
Murali Andoorveedu	5d4d90536f	[Distributed] Add send and recv helpers (#5719 )	2024-06-23 14:42:28 -07:00
youkaichao	832ea88fcb	[core][distributed] improve shared memory broadcast (#5754 )	2024-06-22 10:00:43 -07:00
youkaichao	d9a252bc8e	[Core][Distributed] add shm broadcast (#5399 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-06-21 05:12:35 +00:00
youkaichao	6c5b7af152	[distributed][misc] use fork by default for mp (#5669 )	2024-06-20 17:06:34 -07:00
youkaichao	db5ec52ad7	[bugfix][distributed] improve p2p capability test (#5612 ) [bugfix][distributed] do not error if two processes do not agree on p2p capability (#5612)	2024-06-18 07:21:05 +00:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00

1 2

85 Commits