squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
youkaichao	09c2eb85dd	[ci][distributed] add pipeline parallel correctness test (#6410 )	2024-07-16 15:44:22 -07:00
Cyrus Leung	d97011512e	[CI/Build] vLLM cache directory for images (#6444 )	2024-07-15 23:12:25 -07:00
youkaichao	69672f116c	[core][distributed] simplify code to support pipeline parallel (#6406 )	2024-07-14 21:20:51 -07:00
youkaichao	ccd3c04571	[ci][build] fix commit id (#6420 ) Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-07-14 22:16:21 +08:00
youkaichao	41708e5034	[ci] try to add multi-node tests (#6280 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-12 21:51:48 -07:00
Kevin H. Luu	b75bce1008	[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-12 09:58:38 -07:00
Lily Liu	d6ab528997	[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351 )	2024-07-12 01:32:06 +00:00
Robert Shaw	7ed6a4f0e1	[ BugFix ] Prompt Logprobs Detokenization (#6223 ) Co-authored-by: Zifei Tong <zifeitong@gmail.com>	2024-07-11 22:02:29 +00:00
Lily Liu	69ec3ca14c	[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-07-04 16:35:51 -07:00
youkaichao	3de6e6a30e	[core][distributed] support n layers % pp size != 0 (#6115 )	2024-07-03 16:40:31 -07:00
Murali Andoorveedu	c5832d2ae9	[Core] Pipeline Parallel Support (#4412 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 10:58:08 -07:00
Robert Shaw	d76084c12f	[ CI ] Re-enable Large Model LM Eval (#6031 )	2024-07-01 12:40:45 -04:00
Robert Shaw	deacb7ec44	[ CI ] Temporarily Disable Large LM-Eval Tests (#6005 ) Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>	2024-06-30 11:56:56 -07:00
SangBin Cho	f5e73c9f1b	[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. (#5909 ) Co-authored-by: sang <sangcho@anyscale.com>	2024-06-30 17:11:15 +00:00
youkaichao	2be6955a3f	[ci][distributed] fix device count call [ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991)	2024-06-30 08:06:13 +00:00
Cyrus Leung	9d47f64eb6	[CI/Build] [3/3] Reorganize entrypoints tests (#5966 )	2024-06-30 12:58:49 +08:00
Roger Wang	bcc6a09b63	[CI/Build] Temporarily Remove Phi3-Vision from TP Test (#5989 )	2024-06-30 09:18:31 +08:00
Robert Shaw	75aa1442db	[ CI/Build ] LM Eval Harness Based CI Testing (#5838 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-06-29 13:04:30 -04:00
Cyrus Leung	99397da534	[CI/Build] Add TP test for vision models (#5892 )	2024-06-29 15:45:54 +00:00
Lily Liu	7041de4384	[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628 ) Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>	2024-06-28 15:28:49 -07:00
xwjiang2010	d12af207d2	[VLM][Bugfix] Make sure that `multi_modal_kwargs` is broadcasted properly (#5880 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-06-27 15:15:24 +08:00
Woo-Yeon Lee	2ce5d6688b	[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414 )	2024-06-25 09:56:06 +00:00
Kevin H. Luu	e9de9dd551	[ci] Remove aws template (#5757 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-24 21:09:02 -07:00
youkaichao	7187507301	[ci][test] fix ca test in main (#5746 )	2024-06-21 14:04:26 -07:00
youkaichao	d9a252bc8e	[Core][Distributed] add shm broadcast (#5399 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-06-21 05:12:35 +00:00
youkaichao	6c5b7af152	[distributed][misc] use fork by default for mp (#5669 )	2024-06-20 17:06:34 -07:00
Kevin H. Luu	949e49a685	[ci] Limit num gpus if specified for A100 (#5694 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-19 16:30:03 -07:00
youkaichao	d571ca0108	[ci][distributed] add tests for custom allreduce (#5689 )	2024-06-19 20:16:04 +00:00
Kevin H. Luu	3ee5c4bca5	[ci] Add A100 queue into AWS CI template (#5648 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-19 08:42:13 -06:00
Ronen Schaffer	7879f24dcc	[Misc] Add OpenTelemetry support (#4687 ) This PR adds basic support for OpenTelemetry distributed tracing. It includes changes to enable tracing functionality and improve monitoring capabilities. I've also added a markdown with print-screens to guide users how to use this feature. You can find it here	2024-06-19 01:17:03 +09:00
Kevin H. Luu	13db4369d9	[ci] Deprecate original CI template (#5624 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-18 14:26:20 +00:00
Roger Wang	4ad7b53e59	[CI/Build][Misc] Update Pytest Marker for VLMs (#5623 )	2024-06-18 13:10:04 +00:00
Antoni Baum	f31c1f90e3	Add basic correctness 2 GPU tests to 4 GPU pipeline (#5518 )	2024-06-16 07:48:02 +00:00
Antoni Baum	50eed24d25	Add `cuda_device_count_stateless` (#5473 )	2024-06-13 16:06:49 -07:00
youkaichao	c4bd03c7c5	[Core][Distributed] add same-node detection (#5369 )	2024-06-11 10:53:59 -07:00
Antoni Baum	ccdc490dda	[Core] Change LoRA embedding sharding to support loading methods (#5038 )	2024-06-06 19:07:57 -07:00
Cyrus Leung	89c920785f	[CI/Build] Update vision tests (#5307 )	2024-06-06 05:17:18 -05:00
Simon Mo	3a6ae1d33c	[CI] Disable flash_attn backend for spec decode (#5286 )	2024-06-05 15:49:27 -07:00
Cyrus Leung	ec784b2526	[CI/Build] Add inputs tests (#5215 )	2024-06-03 21:01:46 -07:00
youkaichao	5bd3c65072	[Core][Optimization] remove vllm-nccl (#5091 )	2024-05-29 05:13:52 +00:00
Cyrus Leung	5ae5ed1e60	[Core] Consolidate prompt arguments to LLM engines (#4328 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-28 13:29:31 -07:00
Alexei-V-Ivanov-AMD	943e72ca56	[Build/CI] Enabling AMD Entrypoints Test (#4834 ) Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>	2024-05-20 11:29:28 -07:00
SangBin Cho	2e9a2227ec	[Lora] Support long context lora (#4787 ) Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files	2024-05-18 16:05:23 +09:00
Alexei-V-Ivanov-AMD	26148120b3	[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests (#4797 )	2024-05-16 20:58:25 -07:00
Cody Yu	973617ae02	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 ) Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cade Daniel <cade@anyscale.com>	2024-05-16 00:53:51 -07:00
Nick Hill	676a99982f	[Core] Add MultiprocessingGPUExecutor (#4539 ) Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>	2024-05-14 10:38:59 -07:00
Sanger Steel	8bc68e198c	[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208 )	2024-05-13 14:57:07 -07:00
Cyrus Leung	350f9e107f	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 ) Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time) Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.	2024-05-13 23:50:09 +09:00
SangBin Cho	f6a593093a	[CI] Make mistral tests pass (#4596 )	2024-05-08 08:44:35 -07:00
Alexei-V-Ivanov-AMD	478aed5827	[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (#4642 )	2024-05-07 09:23:17 -07:00

1 2

83 Commits