squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
sroy745	f3a507f1d3	[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149 )	2024-10-10 14:17:17 +08:00
youkaichao	c8627cd41b	[ci][test] use load dummy for testing (#9165 )	2024-10-09 00:38:40 -07:00
Michael Goin	9ba0bd6aa6	Add `lm-eval` directly to requirements-test.txt (#9161 )	2024-10-08 18:22:31 -07:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
Lily Liu	1570203864	[Spec Decode] (1/2) Remove batch expansion (#8839 )	2024-10-01 16:04:42 -07:00
Lily Liu	bce324487a	[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975 )	2024-10-01 00:51:40 +00:00
Kevin H. Luu	1425a1bcf9	[ci] Add CODEOWNERS for test directories (#8795 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-10-01 00:47:08 +00:00
Cyrus Leung	e1a3f5e831	[CI/Build] Update models tests & examples (#8874 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-28 09:54:35 -07:00
Nick Hill	4b377d6feb	[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829 )	2024-09-26 16:46:43 -07:00
Kevin H. Luu	d9cfbc891e	[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-09-26 15:02:16 -07:00
bnellnm	300da09177	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
Hongxia Yang	1c046447a6	[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777 )	2024-09-25 22:26:37 +08:00
Alexey Kondratiev(AMD)	6cb748e190	[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551 )	2024-09-19 13:06:32 -07:00
Alexander Matveev	7c7714d856	[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-18 13:56:58 +00:00
sroy745	1009e93c5d	[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631 )	2024-09-17 07:35:01 -07:00
youkaichao	99aa4eddaf	[torch.compile] register allreduce operations as custom ops (#8526 )	2024-09-16 22:57:57 -07:00
Cyrus Leung	a84e598e21	[CI/Build] Reorganize models tests (#7820 )	2024-09-13 10:20:06 -07:00
Nick Hill	551ce01078	[Core] Add engine option to return only deltas or final output (#7381 )	2024-09-12 12:02:00 -07:00
Joe Runde	f2e263b801	[Bugfix] Offline mode fix (#8376 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-12 11:11:57 -07:00
Lily Liu	775f00f81e	[Speculative Decoding] Test refactor (#8317 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-11 14:07:34 -07:00
Alexey Kondratiev(AMD)	f421f3cefb	[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail (#8130 )	2024-09-10 11:51:15 -07:00
Dipika Sikka	6cd5e5b07e	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
Cyrus Leung	288a938872	[Doc] Indicate more information about supported modalities (#8181 )	2024-09-05 10:51:53 +00:00
Kevin H. Luu	ba262c4e5a	[ci] Mark LoRA test as soft-fail (#8160 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-09-04 20:33:12 -07:00
Kyle Mistele	e02ce498be	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 ) Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>	2024-09-04 13:18:13 -07:00
alexeykondrat	d1dec64243	[CI/Build][ROCm] Enabling LoRA tests on ROCm (#7369 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-04 11:57:54 -07:00
Roger Wang	5231f0898e	[Frontend][VLM] Add support for multiple multi-modal items (#8049 )	2024-08-31 16:35:53 -07:00
youkaichao	ce6bf3a2cf	[torch.compile] avoid Dynamo guard evaluation overhead (#7898 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-08-28 16:10:12 -07:00
alexeykondrat	42e932c7d4	[CI/Build][ROCm] Enabling tensorizer tests for ROCm (#7237 )	2024-08-27 10:09:13 -07:00
youkaichao	7d9ffa2ae1	[misc][core] lazy import outlines (#7831 )	2024-08-24 00:51:38 -07:00
Alexander Matveev	9db93de20c	[Core] Add multi-step support to LLMEngine (#7789 )	2024-08-23 12:45:53 -07:00
SangBin Cho	c01a6cb231	[Ray backend] Better error when pg topology is bad. (#7584 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-08-22 17:44:25 -07:00
youkaichao	8c6f694a79	[ci] refine dependency for distributed tests (#7776 )	2024-08-22 00:54:15 -07:00
William Lin	5844017285	[ci] [multi-step] narrow multi-step test dependency paths (#7760 )	2024-08-21 15:52:40 -07:00
Robert Shaw	f7e3b0c5aa	[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend (#7394 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-21 13:34:14 -04:00
Ronen Schaffer	2aa00d59ad	[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266 ) [CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)	2024-08-20 10:02:21 -07:00
William Lin	47b65a5508	[core] Multi Step Scheduling (#7000 ) Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>	2024-08-19 13:52:13 -07:00
Peng Guanwen	f710fb5265	[Core] Use flashinfer sampling kernel when available (#7137 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-19 03:24:03 +00:00
SangBin Cho	4706eb628e	[aDAG] Unflake aDAG + PP tests (#7600 )	2024-08-16 20:49:30 -07:00
Alexei-V-Ivanov-AMD	6bd19551b0	.[Build/CI] Enabling passing AMD tests. (#7610 )	2024-08-16 20:25:32 -07:00
Mahesh Keralapura	93478b63d2	[Core] Fix tracking of model forward time in case of PP>1 (#7440 ) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)	2024-08-16 13:46:01 -07:00
youkaichao	54bd9a03c4	register custom op for flash attn and use from torch.ops (#7536 )	2024-08-15 22:38:56 -07:00
nunjunj	3b19e39dc5	Chat method for offline llm (#5049 ) Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-08-15 19:41:34 -07:00
youkaichao	4cd7d47fed	[ci/test] rearrange tests and make adag test soft fail (#7572 )	2024-08-15 19:39:04 -07:00
youkaichao	d3d9cb6e4b	[ci] fix model tests (#7507 )	2024-08-14 01:01:43 -07:00
Cyrus Leung	dd164d72f3	[Bugfix][Docs] Update list of mock imports (#7493 )	2024-08-13 20:37:30 -07:00
youkaichao	ea49e6a3c8	[misc][ci] fix cpu test with plugins (#7489 )	2024-08-13 19:27:46 -07:00
youkaichao	16422ea76f	[misc][plugin] add plugin system implementation (#7426 )	2024-08-13 16:24:17 -07:00
Dipika Sikka	fb377d7e74	[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281 )	2024-08-13 14:30:11 -04:00
Kevin H. Luu	65950e8f58	[ci] Entrypoints run upon changes in vllm/ (#7423 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-12 10:18:03 -07:00

1 2 3

149 Commits