squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Murali Andoorveedu	5d4d90536f	[Distributed] Add send and recv helpers (#5719 )	2024-06-23 14:42:28 -07:00
Varun Sundar Rabindranath	6c916ac8a8	[BugFix] [Kernel] Add Cutlass2x fallback kernels (#5744 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-06-23 21:07:11 +00:00
youkaichao	832ea88fcb	[core][distributed] improve shared memory broadcast (#5754 )	2024-06-22 10:00:43 -07:00
Woosuk Kwon	8c00f9c15d	[Docs][TPU] Add installation tip for TPU (#5761 )	2024-06-21 23:09:40 -07:00
Woosuk Kwon	0cbc1d2b4f	[Bugfix] Fix pin_lora error in TPU executor (#5760 )	2024-06-21 22:25:14 -07:00
zifeitong	ff9ddbceee	[Misc] Remove #4789 workaround left in vllm/entrypoints/openai/run_batch.py (#5756 )	2024-06-22 03:33:12 +00:00
Jie Fu (傅杰)	9c62db07ed	[Model] Support Qwen-VL and Qwen-VL-Chat models with text-only inputs (#5710 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-22 02:07:08 +00:00
Kunshang Ji	cf90ae0123	[CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline (#5616 )	2024-06-21 17:09:34 -07:00
rohithkrn	f5dda63eb5	[LoRA] Add support for pinning lora adapters in the LRU cache (#5603 )	2024-06-21 15:42:46 -07:00
youkaichao	7187507301	[ci][test] fix ca test in main (#5746 )	2024-06-21 14:04:26 -07:00
zhyncs	f1e72cc19a	[BugFix] exclude version 1.15.0 for modelscope (#5668 )	2024-06-21 13:15:48 -06:00
Michael Goin	5b15bde539	[Doc] Documentation on supported hardware for quantization methods (#5745 )	2024-06-21 12:44:29 -04:00
Roger Wang	bd620b01fb	[Kernel][CPU] Add Quick `gelu` to CPU (#5717 )	2024-06-21 06:39:40 +00:00
youkaichao	d9a252bc8e	[Core][Distributed] add shm broadcast (#5399 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-06-21 05:12:35 +00:00
Jee Li	67005a07bc	[Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (#5665 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-06-21 04:46:28 +00:00
Chang Su	c35e4a3dd7	[BugFix] Fix test_phi3v.py (#5725 )	2024-06-21 04:45:34 +00:00
Jinzhen Lin	1f5674218f	[Kernel] Add punica dimension for Qwen2 LoRA (#5441 )	2024-06-20 17:55:41 -07:00
Joshua Rosenkranz	b12518d3cf	[Model] MLPSpeculator speculative decoding support (#4947 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Davis Wertheimer <Davis.Wertheimer@ibm.com>	2024-06-20 20:23:12 -04:00
youkaichao	6c5b7af152	[distributed][misc] use fork by default for mp (#5669 )	2024-06-20 17:06:34 -07:00
Michael Goin	8065a7e220	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
Tyler Michael Smith	3f3b6b2150	[Bugfix] Fix the CUDA version check for FP8 support in the CUTLASS kernels (#5715 )	2024-06-20 18:36:10 +00:00
Varun Sundar Rabindranath	a7dcc62086	[Kernel] Update Cutlass int8 kernel configs for SM80 (#5275 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-06-20 13:33:21 +00:00
Roger Wang	ad137cd111	[Model] Port over CLIPVisionModel for VLMs (#5591 )	2024-06-20 11:52:09 +00:00
Varun Sundar Rabindranath	111af1fa2c	[Kernel] Update Cutlass int8 kernel configs for SM90 (#5514 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-06-20 06:37:08 +00:00
Roger Wang	1b2eaac316	[Bugfix][Doc] FIx Duplicate Explicit Target Name Errors (#5703 )	2024-06-19 23:10:47 -07:00
Cyrus Leung	3730a1c832	[Misc] Improve conftest (#5681 )	2024-06-19 19:09:21 -07:00
Kevin H. Luu	949e49a685	[ci] Limit num gpus if specified for A100 (#5694 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-19 16:30:03 -07:00
Dipika Sikka	4a30d7e3cc	[Misc] Add per channel support for static activation quantization; update w8a8 schemes to share base classes (#5650 )	2024-06-19 18:06:44 -04:00
Rafael Vasquez	e83db9e7e3	[Doc] Update docker references (#5614 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-06-19 15:01:45 -07:00
zifeitong	78687504f7	[Bugfix] AsyncLLMEngine hangs with asyncio.run (#5654 )	2024-06-19 13:57:12 -07:00
youkaichao	d571ca0108	[ci][distributed] add tests for custom allreduce (#5689 )	2024-06-19 20:16:04 +00:00
Michael Goin	afed90a034	[Frontend][Bugfix] Fix preemption_mode -> preemption-mode for CLI arg in arg_utils.py (#5688 )	2024-06-19 14:41:42 -04:00
Kevin H. Luu	3ee5c4bca5	[ci] Add A100 queue into AWS CI template (#5648 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-19 08:42:13 -06:00
Cyrus Leung	e9c2732b97	[CI/Build] Add tqdm to dependencies (#5680 )	2024-06-19 08:37:33 -06:00
DearPlanet	d8714530d1	[Misc]Add param max-model-len in benchmark_latency.py (#5629 )	2024-06-19 18:19:08 +08:00
Isotr0py	7d46c8d378	[Bugfix] Fix sampling_params passed incorrectly in Phi3v example (#5684 )	2024-06-19 17:58:32 +08:00
Michael Goin	da971ec7a5	[Model] Add FP8 kv cache for Qwen2 (#5656 )	2024-06-19 09:38:26 +00:00
youkaichao	3eea74889f	[misc][distributed] use 127.0.0.1 for single-node (#5619 )	2024-06-19 08:05:00 +00:00
Hongxia Yang	f758aed0e8	[Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices (#5641 )	2024-06-18 23:21:29 -07:00
Thomas Parnell	e5150f2c28	[Bugfix] Added test for sampling repetition penalty bug. (#5659 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-06-19 06:03:55 +00:00
Shukant Pal	59a1eb59c9	[Bugfix] Fix Phi-3 Long RoPE scaling implementation (#5628 )	2024-06-19 01:46:38 +00:00
Tyler Michael Smith	6820724e51	[Bugfix] Fix w8a8 benchmarks for int8 case (#5643 )	2024-06-19 00:33:25 +00:00
Tyler Michael Smith	b23ce92032	[Bugfix] Fix CUDA version check for mma warning suppression (#5642 )	2024-06-18 23:48:49 +00:00
milo157	2bd231a7b7	[Doc] Added cerebrium as Integration option (#5553 )	2024-06-18 15:56:59 -07:00
Thomas Parnell	8a173382c8	[Bugfix] Fix for inconsistent behaviour related to sampling and repetition penalties (#5639 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-06-18 14:18:37 -07:00
sergey-tinkoff	07feecde1a	[Model] LoRA support added for command-r (#5178 )	2024-06-18 11:01:21 -07:00
Kevin H. Luu	19091efc44	[ci] Setup Release pipeline and build release wheels with cache (#5610 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-18 11:00:36 -07:00
Dipika Sikka	95db455e7f	[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization (#5542 )	2024-06-18 12:45:05 -04:00
Ronen Schaffer	7879f24dcc	[Misc] Add OpenTelemetry support (#4687 ) This PR adds basic support for OpenTelemetry distributed tracing. It includes changes to enable tracing functionality and improve monitoring capabilities. I've also added a markdown with print-screens to guide users how to use this feature. You can find it here	2024-06-19 01:17:03 +09:00
Kevin H. Luu	13db4369d9	[ci] Deprecate original CI template (#5624 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-18 14:26:20 +00:00

1 2 3 4 5 ...

1687 Commits