squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kyle Mistele	e02ce498be	[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649 ) Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com> Co-authored-by: Kyle Mistele <kyle@constellate.ai>	2024-09-04 13:18:13 -07:00
alexeykondrat	d1dec64243	[CI/Build][ROCm] Enabling LoRA tests on ROCm (#7369 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-09-04 11:57:54 -07:00
Cody Yu	2ad2e5608e	[MISC] Consolidate FP8 kv-cache tests (#8131 )	2024-09-04 18:53:25 +00:00
TimWang	ccd7207191	chore: Update check-wheel-size.py to read MAX_SIZE_MB from env (#8103 )	2024-09-03 23:17:05 -07:00
Roger Wang	5231f0898e	[Frontend][VLM] Add support for multiple multi-modal items (#8049 )	2024-08-31 16:35:53 -07:00
Michael Goin	af59df0a10	Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (#7961 )	2024-08-28 19:19:17 -04:00
youkaichao	ce6bf3a2cf	[torch.compile] avoid Dynamo guard evaluation overhead (#7898 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-08-28 16:10:12 -07:00
alexeykondrat	42e932c7d4	[CI/Build][ROCm] Enabling tensorizer tests for ROCm (#7237 )	2024-08-27 10:09:13 -07:00
youkaichao	64cc644425	[core][torch.compile] discard the compile for profiling (#7796 )	2024-08-26 21:33:58 -07:00
youkaichao	7d9ffa2ae1	[misc][core] lazy import outlines (#7831 )	2024-08-24 00:51:38 -07:00
Alexander Matveev	9db93de20c	[Core] Add multi-step support to LLMEngine (#7789 )	2024-08-23 12:45:53 -07:00
SangBin Cho	c01a6cb231	[Ray backend] Better error when pg topology is bad. (#7584 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-08-22 17:44:25 -07:00
youkaichao	8c6f694a79	[ci] refine dependency for distributed tests (#7776 )	2024-08-22 00:54:15 -07:00
Luka Govedič	7937009a7e	[Kernel] Replaced `blockReduce[...]` functions with `cub::BlockReduce` (#7233 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-21 20:18:00 -04:00
William Lin	5844017285	[ci] [multi-step] narrow multi-step test dependency paths (#7760 )	2024-08-21 15:52:40 -07:00
Robert Shaw	f7e3b0c5aa	[Bugfix][Frontend] Fix Issues Under High Load With `zeromq` Frontend (#7394 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-21 13:34:14 -04:00
Ronen Schaffer	2aa00d59ad	[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266 ) [CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)	2024-08-20 10:02:21 -07:00
Kuntai Du	3d8a5f063d	[CI] Organizing performance benchmark files (#7616 )	2024-08-19 22:43:54 -07:00
William Lin	47b65a5508	[core] Multi Step Scheduling (#7000 ) Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>	2024-08-19 13:52:13 -07:00
Peng Guanwen	f710fb5265	[Core] Use flashinfer sampling kernel when available (#7137 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-19 03:24:03 +00:00
Alex Brooks	40e1360bb6	[CI/Build] Add text-only test for Qwen models (#7475 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-08-19 07:43:46 +08:00
SangBin Cho	4706eb628e	[aDAG] Unflake aDAG + PP tests (#7600 )	2024-08-16 20:49:30 -07:00
Alexei-V-Ivanov-AMD	6bd19551b0	.[Build/CI] Enabling passing AMD tests. (#7610 )	2024-08-16 20:25:32 -07:00
Michael Goin	44f26a9466	[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611 )	2024-08-16 15:56:34 -07:00
Mahesh Keralapura	93478b63d2	[Core] Fix tracking of model forward time in case of PP>1 (#7440 ) [Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)	2024-08-16 13:46:01 -07:00
Kuntai Du	6fc5b0f249	[CI] Fix crashes of performance benchmark (#7500 )	2024-08-16 08:08:45 -07:00
youkaichao	54bd9a03c4	register custom op for flash attn and use from torch.ops (#7536 )	2024-08-15 22:38:56 -07:00
nunjunj	3b19e39dc5	Chat method for offline llm (#5049 ) Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal> Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-08-15 19:41:34 -07:00
youkaichao	4cd7d47fed	[ci/test] rearrange tests and make adag test soft fail (#7572 )	2024-08-15 19:39:04 -07:00
PHILO-HE	f4da5f7b6d	[Misc] Update dockerfile for CPU to cover protobuf installation (#7182 )	2024-08-15 10:03:01 -07:00
youkaichao	d3d9cb6e4b	[ci] fix model tests (#7507 )	2024-08-14 01:01:43 -07:00
Cyrus Leung	dd164d72f3	[Bugfix][Docs] Update list of mock imports (#7493 )	2024-08-13 20:37:30 -07:00
youkaichao	ea49e6a3c8	[misc][ci] fix cpu test with plugins (#7489 )	2024-08-13 19:27:46 -07:00
youkaichao	16422ea76f	[misc][plugin] add plugin system implementation (#7426 )	2024-08-13 16:24:17 -07:00
Dipika Sikka	fb377d7e74	[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281 )	2024-08-13 14:30:11 -04:00
Dipika Sikka	181abbc27d	[Misc] Update LM Eval Tolerance (#7473 )	2024-08-13 14:28:14 -04:00
Kevin H. Luu	65950e8f58	[ci] Entrypoints run upon changes in vllm/ (#7423 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-12 10:18:03 -07:00
Lily Liu	ec2affa8ae	[Kernel] Flashinfer correctness fix for v0.1.3 (#7319 )	2024-08-12 07:59:17 +00:00
Kevin H. Luu	469b3bc538	[ci] Make building wheels per commit optional (#7278 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-07 11:34:25 -07:00
afeldman-nm	fd95e026e0	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 ) Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-06 16:51:47 -04:00
Dipika Sikka	a3bbbfa1d8	[BugFix] Fix DeepSeek remote code (#7178 )	2024-08-06 08:16:53 -07:00
Simon Mo	e3c664bfcb	[Build] Add initial conditional testing spec (#6841 )	2024-08-05 17:39:22 -07:00
Kuntai Du	67d745cc68	[CI] Temporarily turn off H100 performance benchmark (#7104 )	2024-08-02 23:52:44 -07:00
youkaichao	04e5583425	[ci][distributed] merge distributed test commands (#7097 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-02 21:33:53 -07:00
omkar kakarparthi	562e580abc	Update run-amd-test.sh (#7044 )	2024-08-01 13:12:37 -07:00
Sage Moore	7e0861bd0b	[CI/Build] Update PyTorch to 2.4.0 (#6951 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-01 11:11:24 -07:00
Alexei-V-Ivanov-AMD	a72a424b3e	[Build/CI] Fixing Docker Hub quota issue. (#7043 )	2024-08-01 11:07:37 -07:00
HandH1998	6512937de1	Support W4A8 quantization for vllm (#5218 )	2024-07-31 07:55:21 -06:00
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Cade Daniel	c32ab8be1a	[Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964 )	2024-07-31 00:53:21 +00:00
Cade Daniel	fb4f530bf5	[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706 )	2024-07-30 16:28:49 -07:00
Cade Daniel	79319cedfa	[Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965 )	2024-07-30 16:28:05 -07:00
Simon Mo	40c27a7cbb	[Build] Temporarily Disable Kernels and LoRA tests (#6961 )	2024-07-30 14:59:48 -07:00
Roger Wang	ecb33a28cb	[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860 )	2024-07-27 09:54:14 +00:00
Joe	14dbd5a767	[Model] H2O Danube3-4b (#6451 )	2024-07-26 20:47:50 -07:00
Sanger Steel	969d032265	[Bugfix]: Fix Tensorizer test failures (#6835 )	2024-07-26 20:02:25 -07:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
Michael Goin	07278c37dd	[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611 )	2024-07-26 14:33:42 -04:00
Kevin H. Luu	2eb9f4ff26	[ci] Mark tensorizer as soft fail and separate from grouped test (#6810 ) [ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-25 18:08:33 -07:00
Kuntai Du	6a1e25b151	[Doc] Add documentations for nightly benchmarks (#6412 )	2024-07-25 11:57:16 -07:00
Robert Shaw	889da130e7	[ Misc ] `fp8-marlin` channelwise via `compressed-tensors` (#6524 ) Co-authored-by: mgoin <michael@neuralmagic.com>	2024-07-25 09:46:04 -07:00
Alexei-V-Ivanov-AMD	b570811706	[Build/CI] Update run-amd-test.sh. Enable Docker Hub login. (#6711 )	2024-07-24 05:01:14 -07:00
youkaichao	72fc704803	[build] relax wheel size limit (#6704 )	2024-07-23 14:03:49 -07:00
Kevin H. Luu	69d5ae38dc	[ci] Use different sccache bucket for CUDA 11.8 wheel build (#6656 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-22 14:20:41 -07:00
Robert Shaw	9364f74eee	[ Kernel ] Enable `fp8-marlin` for `fbgemm-fp8` models (#6606 )	2024-07-20 18:50:10 +00:00
Matt Wong	06d6c5fe9f	[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543 )	2024-07-20 09:39:07 -07:00
Robert Shaw	683e3cb9c4	[ Misc ] `fbgemm` checkpoints (#6559 )	2024-07-20 09:36:57 -07:00
Robert Shaw	4cc24f01b1	[ Kernel ] Enable Dynamic Per Token `fp8` (#6547 )	2024-07-19 23:08:15 +00:00
Robert Shaw	dbe5588554	[ Misc ] non-uniform quantization via `compressed-tensors` for `Llama` (#6515 )	2024-07-18 22:39:18 -04:00
Nick Hill	b5672a112c	[Core] Multiprocessing Pipeline Parallel support (#6130 ) Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-18 19:15:52 -07:00
Tyler Michael Smith	1689219ebf	[CI/Build] Build on Ubuntu 20.04 instead of 22.04 (#6517 )	2024-07-18 17:29:25 -07:00
youkaichao	f53b8f0d05	[ci][test] add correctness test for cpu offloading (#6549 )	2024-07-18 23:41:06 +00:00
Rui Qiao	61e592747c	[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>	2024-07-17 22:27:09 -07:00
youkaichao	1c27d25fb5	[core][model] yet another cpu offload implementation (#6496 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-17 20:54:35 -07:00
youkaichao	09c2eb85dd	[ci][distributed] add pipeline parallel correctness test (#6410 )	2024-07-16 15:44:22 -07:00
Cyrus Leung	d97011512e	[CI/Build] vLLM cache directory for images (#6444 )	2024-07-15 23:12:25 -07:00
Woosuk Kwon	4552e37b55	[CI/Build][TPU] Add TPU CI test (#6277 ) Co-authored-by: kevin <kevin@anyscale.com>	2024-07-15 14:31:16 -07:00
youkaichao	69672f116c	[core][distributed] simplify code to support pipeline parallel (#6406 )	2024-07-14 21:20:51 -07:00
Robert Shaw	a754dc2cb9	[CI/Build] Cross python wheel (#6394 )	2024-07-14 18:54:46 -07:00
Robert Shaw	73030b7dae	[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423 )	2024-07-14 21:38:42 +00:00
youkaichao	ccd3c04571	[ci][build] fix commit id (#6420 ) Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-07-14 22:16:21 +08:00
Tyler Michael Smith	9dad5cc859	[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384 )	2024-07-14 13:37:19 +00:00
Robert Shaw	fb6af8bc08	[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417 )	2024-07-13 20:03:58 -07:00
Robert Shaw	babf52dade	[ Misc ] More Cleanup of Marlin (#6359 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-07-13 10:21:37 +00:00
youkaichao	41708e5034	[ci] try to add multi-node tests (#6280 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-12 21:51:48 -07:00
Simon Mo	6bc9710f6e	Fix release pipeline's dir permission (#6391 )	2024-07-12 15:52:43 -07:00
Simon Mo	21b2dcedab	Fix release pipeline's -e flag (#6390 )	2024-07-12 14:08:04 -07:00
Simon Mo	07b35af86d	Fix interpolation in release pipeline (#6389 )	2024-07-12 14:03:39 -07:00
Simon Mo	bb1a784b05	Fix release-pipeline.yaml (#6388 )	2024-07-12 14:00:57 -07:00
Simon Mo	d719ba24c5	Build some nightly wheels by default (#6380 )	2024-07-12 13:56:59 -07:00
Kevin H. Luu	b75bce1008	[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-12 09:58:38 -07:00
Alexei-V-Ivanov-AMD	f9d25c2519	[Build/CI] Checking/Waiting for the GPU's clean state (#6379 )	2024-07-12 09:42:24 -07:00
Robert Shaw	aea19f0989	[ Misc ] Support Models With Bias in `compressed-tensors` integration (#6356 )	2024-07-12 11:11:29 -04:00
adityagoel14	d26a8b3f1f	[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub (#6350 )	2024-07-11 21:26:26 -07:00
Lily Liu	d6ab528997	[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351 )	2024-07-12 01:32:06 +00:00
Robert Shaw	7ed6a4f0e1	[ BugFix ] Prompt Logprobs Detokenization (#6223 ) Co-authored-by: Zifei Tong <zifeitong@gmail.com>	2024-07-11 22:02:29 +00:00
Kuntai Du	a4feba929b	[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362 )	2024-07-11 13:28:38 -07:00
Simon Mo	52b7fcb35a	Benchmark: add H100 suite (#6047 )	2024-07-11 09:17:07 -07:00
Kevin H. Luu	a0550cbc80	Add support for multi-node on CI (#5955 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-09 12:56:56 -07:00
Robert Shaw	abfe705a02	[ Misc ] Support Fp8 via `llm-compressor` (#6110 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-07-07 20:42:11 +00:00

1 2 3 4 5 ...

279 Commits