squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Robert Shaw	d76084c12f	[ CI ] Re-enable Large Model LM Eval (#6031 )	2024-07-01 12:40:45 -04:00
Robert Shaw	deacb7ec44	[ CI ] Temporarily Disable Large LM-Eval Tests (#6005 ) Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>	2024-06-30 11:56:56 -07:00
SangBin Cho	f5e73c9f1b	[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. (#5909 ) Co-authored-by: sang <sangcho@anyscale.com>	2024-06-30 17:11:15 +00:00
youkaichao	2be6955a3f	[ci][distributed] fix device count call [ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991)	2024-06-30 08:06:13 +00:00
Cyrus Leung	9d47f64eb6	[CI/Build] [3/3] Reorganize entrypoints tests (#5966 )	2024-06-30 12:58:49 +08:00
Roger Wang	bcc6a09b63	[CI/Build] Temporarily Remove Phi3-Vision from TP Test (#5989 )	2024-06-30 09:18:31 +08:00
Robert Shaw	75aa1442db	[ CI/Build ] LM Eval Harness Based CI Testing (#5838 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-06-29 13:04:30 -04:00
Cyrus Leung	99397da534	[CI/Build] Add TP test for vision models (#5892 )	2024-06-29 15:45:54 +00:00
Lily Liu	7041de4384	[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628 ) Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>, bong-furiosa <bongwon.jang@furiosa.ai>	2024-06-28 15:28:49 -07:00
Ilya Lavrenov	57f09a419c	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
xwjiang2010	d12af207d2	[VLM][Bugfix] Make sure that `multi_modal_kwargs` is broadcasted properly (#5880 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-06-27 15:15:24 +08:00
Woo-Yeon Lee	2ce5d6688b	[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414 )	2024-06-25 09:56:06 +00:00
Kevin H. Luu	e9de9dd551	[ci] Remove aws template (#5757 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-24 21:09:02 -07:00
Kunshang Ji	cf90ae0123	[CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline (#5616 )	2024-06-21 17:09:34 -07:00
youkaichao	7187507301	[ci][test] fix ca test in main (#5746 )	2024-06-21 14:04:26 -07:00
youkaichao	d9a252bc8e	[Core][Distributed] add shm broadcast (#5399 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-06-21 05:12:35 +00:00
youkaichao	6c5b7af152	[distributed][misc] use fork by default for mp (#5669 )	2024-06-20 17:06:34 -07:00
Kevin H. Luu	949e49a685	[ci] Limit num gpus if specified for A100 (#5694 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-19 16:30:03 -07:00
youkaichao	d571ca0108	[ci][distributed] add tests for custom allreduce (#5689 )	2024-06-19 20:16:04 +00:00
Kevin H. Luu	3ee5c4bca5	[ci] Add A100 queue into AWS CI template (#5648 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-19 08:42:13 -06:00
Kevin H. Luu	19091efc44	[ci] Setup Release pipeline and build release wheels with cache (#5610 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-18 11:00:36 -07:00
Ronen Schaffer	7879f24dcc	[Misc] Add OpenTelemetry support (#4687 ) This PR adds basic support for OpenTelemetry distributed tracing. It includes changes to enable tracing functionality and improve monitoring capabilities. I've also added a markdown with print-screens to guide users how to use this feature. You can find it here	2024-06-19 01:17:03 +09:00
Kevin H. Luu	13db4369d9	[ci] Deprecate original CI template (#5624 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-18 14:26:20 +00:00
Roger Wang	4ad7b53e59	[CI/Build][Misc] Update Pytest Marker for VLMs (#5623 )	2024-06-18 13:10:04 +00:00
Kuntai Du	114d7270ff	[CI] Avoid naming different metrics with the same name in performance benchmark (#5615 )	2024-06-17 21:37:18 -07:00
Cyrus Leung	32c86e494a	[Misc] Fix typo (#5618 )	2024-06-17 20:58:30 -07:00
Kuntai Du	9e4e6fe207	[CI] the readability of benchmarking and prepare for dashboard (#5571 ) [CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard (#5571)	2024-06-17 11:41:08 -07:00
Jie Fu (傅杰)	ab66536dbf	[CI/BUILD] Support non-AVX512 vLLM building and testing (#5574 )	2024-06-17 14:36:10 -04:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
Antoni Baum	f31c1f90e3	Add basic correctness 2 GPU tests to 4 GPU pipeline (#5518 )	2024-06-16 07:48:02 +00:00
Simon Mo	bd7efe95d0	Add ccache to amd (#5555 )	2024-06-14 17:18:22 -07:00
Cyrus Leung	d47af2bc02	[CI/Build] Disable LLaVA-NeXT CPU test (#5529 )	2024-06-14 09:27:30 -07:00
Kuntai Du	319ad7f1d3	[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label (#5073 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-06-13 22:36:20 -07:00
Antoni Baum	50eed24d25	Add `cuda_device_count_stateless` (#5473 )	2024-06-13 16:06:49 -07:00
Kevin H. Luu	916d219d62	[ci] Use sccache to build images (#5419 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-12 17:58:12 -07:00
Kevin H. Luu	8b82a89997	[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests (#5464 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-12 14:00:18 -07:00
youkaichao	c4bd03c7c5	[Core][Distributed] add same-node detection (#5369 )	2024-06-11 10:53:59 -07:00
Kevin H. Luu	76477a93b7	[ci] Fix Buildkite agent path (#5392 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 18:58:07 -07:00
Kevin H. Luu	c5602f0baa	[ci] Mount buildkite agent on Docker container to upload benchmark results (#5330 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 09:22:34 -07:00
Kevin H. Luu	f7f9c5f97b	[ci] Use small_cpu_queue for doc build (#5331 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 09:21:11 -07:00
Antoni Baum	ccdc490dda	[Core] Change LoRA embedding sharding to support loading methods (#5038 )	2024-06-06 19:07:57 -07:00
Cyrus Leung	89c920785f	[CI/Build] Update vision tests (#5307 )	2024-06-06 05:17:18 -05:00
Simon Mo	3a6ae1d33c	[CI] Disable flash_attn backend for spec decode (#5286 )	2024-06-05 15:49:27 -07:00
Tyler Michael Smith	02cc3b51a7	[misc] benchmark_serving.py -- add ITL results and tweak TPOT results (#5263 )	2024-06-05 10:17:51 -07:00
Simon Mo	d5b1eb081e	[CI] Add nightly benchmarks (#5260 )	2024-06-05 09:42:08 -07:00
Simon Mo	9ca62d8668	[CI] mark AMD test as softfail to prevent blockage (#5256 )	2024-06-04 11:34:53 -07:00
Li, Jiang	45c35f0d58	[CI/Build] Reducing CPU CI execution time (#5241 )	2024-06-04 10:26:40 -07:00
Cyrus Leung	ec784b2526	[CI/Build] Add inputs tests (#5215 )	2024-06-03 21:01:46 -07:00
Kevin H. Luu	4f0d17c05c	New CI template on AWS stack (#5110 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-03 16:16:43 -07:00
Yuan	cafb8e06c5	[CI/BUILD] enable intel queue for longer CPU tests (#4113 )	2024-06-03 10:39:50 -07:00

1 2 3

119 Commits