squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
youkaichao	99aa4eddaf	[torch.compile] register allreduce operations as custom ops (#8526 )	2024-09-16 22:57:57 -07:00
Cyrus Leung	a84e598e21	[CI/Build] Reorganize models tests (#7820 )	2024-09-13 10:20:06 -07:00
Nick Hill	551ce01078	[Core] Add engine option to return only deltas or final output (#7381 )	2024-09-12 12:02:00 -07:00
Joe Runde	f2e263b801	[Bugfix] Offline mode fix (#8376 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-09-12 11:11:57 -07:00
Lily Liu	775f00f81e	[Speculative Decoding] Test refactor (#8317 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-11 14:07:34 -07:00
Alexey Kondratiev(AMD)	f421f3cefb	[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail (#8130 )	2024-09-10 11:51:15 -07:00
Dipika Sikka	6cd5e5b07e	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
Cyrus Leung	288a938872	[Doc] Indicate more information about supported modalities (#8181 )	2024-09-05 10:51:53 +00:00
Kevin H. Luu	ba262c4e5a	[ci] Mark LoRA test as soft-fail (#8160 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-09-04 20:33:12 -07:00
Kyle Mistele	e02ce498be	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 ) Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>	2024-09-04 13:18:13 -07:00
alexeykondrat	d1dec64243	[CI/Build][ROCm] Enabling LoRA tests on ROCm (#7369 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-04 11:57:54 -07:00
Roger Wang	5231f0898e	[Frontend][VLM] Add support for multiple multi-modal items (#8049 )	2024-08-31 16:35:53 -07:00
youkaichao	ce6bf3a2cf	[torch.compile] avoid Dynamo guard evaluation overhead (#7898 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-08-28 16:10:12 -07:00
alexeykondrat	42e932c7d4	[CI/Build][ROCm] Enabling tensorizer tests for ROCm (#7237 )	2024-08-27 10:09:13 -07:00
youkaichao	7d9ffa2ae1	[misc][core] lazy import outlines (#7831 )	2024-08-24 00:51:38 -07:00
Alexander Matveev	9db93de20c	[Core] Add multi-step support to LLMEngine (#7789 )	2024-08-23 12:45:53 -07:00
SangBin Cho	c01a6cb231	[Ray backend] Better error when pg topology is bad. (#7584 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-08-22 17:44:25 -07:00
youkaichao	8c6f694a79	[ci] refine dependency for distributed tests (#7776 )	2024-08-22 00:54:15 -07:00
William Lin	5844017285	[ci] [multi-step] narrow multi-step test dependency paths (#7760 )	2024-08-21 15:52:40 -07:00
Robert Shaw	f7e3b0c5aa	[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend (#7394 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-21 13:34:14 -04:00
Ronen Schaffer	2aa00d59ad	[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266 ) [CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)	2024-08-20 10:02:21 -07:00
William Lin	47b65a5508	[core] Multi Step Scheduling (#7000 ) Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>	2024-08-19 13:52:13 -07:00
Peng Guanwen	f710fb5265	[Core] Use flashinfer sampling kernel when available (#7137 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-19 03:24:03 +00:00
SangBin Cho	4706eb628e	[aDAG] Unflake aDAG + PP tests (#7600 )	2024-08-16 20:49:30 -07:00
Alexei-V-Ivanov-AMD	6bd19551b0	.[Build/CI] Enabling passing AMD tests. (#7610 )	2024-08-16 20:25:32 -07:00
Mahesh Keralapura	93478b63d2	[Core] Fix tracking of model forward time in case of PP>1 (#7440 ) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)	2024-08-16 13:46:01 -07:00
youkaichao	54bd9a03c4	register custom op for flash attn and use from torch.ops (#7536 )	2024-08-15 22:38:56 -07:00
nunjunj	3b19e39dc5	Chat method for offline llm (#5049 ) Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-08-15 19:41:34 -07:00
youkaichao	4cd7d47fed	[ci/test] rearrange tests and make adag test soft fail (#7572 )	2024-08-15 19:39:04 -07:00
youkaichao	d3d9cb6e4b	[ci] fix model tests (#7507 )	2024-08-14 01:01:43 -07:00
Cyrus Leung	dd164d72f3	[Bugfix][Docs] Update list of mock imports (#7493 )	2024-08-13 20:37:30 -07:00
youkaichao	ea49e6a3c8	[misc][ci] fix cpu test with plugins (#7489 )	2024-08-13 19:27:46 -07:00
youkaichao	16422ea76f	[misc][plugin] add plugin system implementation (#7426 )	2024-08-13 16:24:17 -07:00
Dipika Sikka	fb377d7e74	[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281 )	2024-08-13 14:30:11 -04:00
Kevin H. Luu	65950e8f58	[ci] Entrypoints run upon changes in vllm/ (#7423 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-12 10:18:03 -07:00
Lily Liu	ec2affa8ae	[Kernel] Flashinfer correctness fix for v0.1.3 (#7319 )	2024-08-12 07:59:17 +00:00
afeldman-nm	fd95e026e0	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 ) Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-06 16:51:47 -04:00
Simon Mo	e3c664bfcb	[Build] Add initial conditional testing spec (#6841 )	2024-08-05 17:39:22 -07:00
youkaichao	04e5583425	[ci][distributed] merge distributed test commands (#7097 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-02 21:33:53 -07:00
Sage Moore	7e0861bd0b	[CI/Build] Update PyTorch to 2.4.0 (#6951 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-01 11:11:24 -07:00
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Simon Mo	40c27a7cbb	[Build] Temporarily Disable Kernels and LoRA tests (#6961 )	2024-07-30 14:59:48 -07:00
Roger Wang	ecb33a28cb	[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860 )	2024-07-27 09:54:14 +00:00
Sanger Steel	969d032265	[Bugfix]: Fix Tensorizer test failures (#6835 )	2024-07-26 20:02:25 -07:00
Kevin H. Luu	2eb9f4ff26	[ci] Mark tensorizer as soft fail and separate from grouped test (#6810 ) [ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-25 18:08:33 -07:00
Matt Wong	06d6c5fe9f	[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543 )	2024-07-20 09:39:07 -07:00
Nick Hill	b5672a112c	[Core] Multiprocessing Pipeline Parallel support (#6130 ) Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-18 19:15:52 -07:00
Tyler Michael Smith	1689219ebf	[CI/Build] Build on Ubuntu 20.04 instead of 22.04 (#6517 )	2024-07-18 17:29:25 -07:00
youkaichao	f53b8f0d05	[ci][test] add correctness test for cpu offloading (#6549 )	2024-07-18 23:41:06 +00:00
Rui Qiao	61e592747c	[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>	2024-07-17 22:27:09 -07:00

1 2 3

134 Commits