squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
SangBin Cho	c01a6cb231	[Ray backend] Better error when pg topology is bad. (#7584 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-08-22 17:44:25 -07:00
youkaichao	8c6f694a79	[ci] refine dependency for distributed tests (#7776 )	2024-08-22 00:54:15 -07:00
Luka Govedič	7937009a7e	[Kernel] Replaced `blockReduce[...]` functions with `cub::BlockReduce` (#7233 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-21 20:18:00 -04:00
William Lin	5844017285	[ci] [multi-step] narrow multi-step test dependency paths (#7760 )	2024-08-21 15:52:40 -07:00
Robert Shaw	f7e3b0c5aa	[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend (#7394 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-21 13:34:14 -04:00
Ronen Schaffer	2aa00d59ad	[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266 ) [CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)	2024-08-20 10:02:21 -07:00
Kuntai Du	3d8a5f063d	[CI] Organizing performance benchmark files (#7616 )	2024-08-19 22:43:54 -07:00
William Lin	47b65a5508	[core] Multi Step Scheduling (#7000 ) Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>	2024-08-19 13:52:13 -07:00
Peng Guanwen	f710fb5265	[Core] Use flashinfer sampling kernel when available (#7137 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-19 03:24:03 +00:00
Alex Brooks	40e1360bb6	[CI/Build] Add text-only test for Qwen models (#7475 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-08-19 07:43:46 +08:00
SangBin Cho	4706eb628e	[aDAG] Unflake aDAG + PP tests (#7600 )	2024-08-16 20:49:30 -07:00
Alexei-V-Ivanov-AMD	6bd19551b0	.[Build/CI] Enabling passing AMD tests. (#7610 )	2024-08-16 20:25:32 -07:00
Michael Goin	44f26a9466	[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611 )	2024-08-16 15:56:34 -07:00
Mahesh Keralapura	93478b63d2	[Core] Fix tracking of model forward time in case of PP>1 (#7440 ) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)	2024-08-16 13:46:01 -07:00
Kuntai Du	6fc5b0f249	[CI] Fix crashes of performance benchmark (#7500 )	2024-08-16 08:08:45 -07:00
youkaichao	54bd9a03c4	register custom op for flash attn and use from torch.ops (#7536 )	2024-08-15 22:38:56 -07:00
nunjunj	3b19e39dc5	Chat method for offline llm (#5049 ) Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-08-15 19:41:34 -07:00
youkaichao	4cd7d47fed	[ci/test] rearrange tests and make adag test soft fail (#7572 )	2024-08-15 19:39:04 -07:00
PHILO-HE	f4da5f7b6d	[Misc] Update dockerfile for CPU to cover protobuf installation (#7182 )	2024-08-15 10:03:01 -07:00
youkaichao	d3d9cb6e4b	[ci] fix model tests (#7507 )	2024-08-14 01:01:43 -07:00
Cyrus Leung	dd164d72f3	[Bugfix][Docs] Update list of mock imports (#7493 )	2024-08-13 20:37:30 -07:00
youkaichao	ea49e6a3c8	[misc][ci] fix cpu test with plugins (#7489 )	2024-08-13 19:27:46 -07:00
youkaichao	16422ea76f	[misc][plugin] add plugin system implementation (#7426 )	2024-08-13 16:24:17 -07:00
Dipika Sikka	fb377d7e74	[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281 )	2024-08-13 14:30:11 -04:00
Dipika Sikka	181abbc27d	[Misc] Update LM Eval Tolerance (#7473 )	2024-08-13 14:28:14 -04:00
Kevin H. Luu	65950e8f58	[ci] Entrypoints run upon changes in vllm/ (#7423 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-12 10:18:03 -07:00
Lily Liu	ec2affa8ae	[Kernel] Flashinfer correctness fix for v0.1.3 (#7319 )	2024-08-12 07:59:17 +00:00
Kevin H. Luu	469b3bc538	[ci] Make building wheels per commit optional (#7278 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-07 11:34:25 -07:00
afeldman-nm	fd95e026e0	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 ) Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-06 16:51:47 -04:00
Dipika Sikka	a3bbbfa1d8	[BugFix] Fix DeepSeek remote code (#7178 )	2024-08-06 08:16:53 -07:00
Simon Mo	e3c664bfcb	[Build] Add initial conditional testing spec (#6841 )	2024-08-05 17:39:22 -07:00
Kuntai Du	67d745cc68	[CI] Temporarily turn off H100 performance benchmark (#7104 )	2024-08-02 23:52:44 -07:00
youkaichao	04e5583425	[ci][distributed] merge distributed test commands (#7097 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-02 21:33:53 -07:00
omkar kakarparthi	562e580abc	Update run-amd-test.sh (#7044 )	2024-08-01 13:12:37 -07:00
Sage Moore	7e0861bd0b	[CI/Build] Update PyTorch to 2.4.0 (#6951 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-01 11:11:24 -07:00
Alexei-V-Ivanov-AMD	a72a424b3e	[Build/CI] Fixing Docker Hub quota issue. (#7043 )	2024-08-01 11:07:37 -07:00
HandH1998	6512937de1	Support W4A8 quantization for vllm (#5218 )	2024-07-31 07:55:21 -06:00
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Cade Daniel	c32ab8be1a	[Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964 )	2024-07-31 00:53:21 +00:00
Cade Daniel	fb4f530bf5	[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706 )	2024-07-30 16:28:49 -07:00
Cade Daniel	79319cedfa	[Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965 )	2024-07-30 16:28:05 -07:00
Simon Mo	40c27a7cbb	[Build] Temporarily Disable Kernels and LoRA tests (#6961 )	2024-07-30 14:59:48 -07:00
Roger Wang	ecb33a28cb	[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860 )	2024-07-27 09:54:14 +00:00
Joe	14dbd5a767	[Model] H2O Danube3-4b (#6451 )	2024-07-26 20:47:50 -07:00
Sanger Steel	969d032265	[Bugfix]: Fix Tensorizer test failures (#6835 )	2024-07-26 20:02:25 -07:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
Michael Goin	07278c37dd	[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611 )	2024-07-26 14:33:42 -04:00
Kevin H. Luu	2eb9f4ff26	[ci] Mark tensorizer as soft fail and separate from grouped test (#6810 ) [ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-25 18:08:33 -07:00
Kuntai Du	6a1e25b151	[Doc] Add documentations for nightly benchmarks (#6412 )	2024-07-25 11:57:16 -07:00
Robert Shaw	889da130e7	[ Misc ] `fp8-marlin` channelwise via `compressed-tensors` (#6524 ) Co-authored-by: mgoin <michael@neuralmagic.com>	2024-07-25 09:46:04 -07:00

1 2 3 4 5

218 Commits