squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
xiaobochen123	660470e5a3	[Core] Optimize evictor-v2 performance (#7193 )	2024-08-06 12:34:25 -07:00
Luka Govedič	8d59dbb000	[Kernel] Add per-tensor and per-token AZP epilogues (#5941 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-08-06 18:17:08 +00:00
Lily Liu	5c60c8c423	[SpecDecode] [Minor] Fix spec decode sampler tests (#7183 )	2024-08-06 10:40:32 -07:00
Katarzyna Papis	00afc78590	[Bugfix] add gguf dependency (#7198 ) Co-authored-by: katarzyna.papis <kpapis@kpapis-u20.sclab.intel.com>	2024-08-06 10:08:35 -07:00
Robert Shaw	541c1852d3	[ BugFix ] Fix ZMQ when `VLLM_PORT` is set (#7205 )	2024-08-06 09:26:26 -07:00
Dipika Sikka	a3bbbfa1d8	[BugFix] Fix DeepSeek remote code (#7178 )	2024-08-06 08:16:53 -07:00
Cyrus Leung	1f26efbb3a	[Model] Support SigLIP encoder and alternative decoders for LLaVA models (#7153 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-08-06 16:55:31 +08:00
Jee Jee Li	9118217f58	[LoRA] Relax LoRA condition (#7146 )	2024-08-06 01:57:25 +00:00
Simon Mo	e3c664bfcb	[Build] Add initial conditional testing spec (#6841 )	2024-08-05 17:39:22 -07:00
Isotr0py	360bd67cf0	[Core] Support loading GGUF model (#5191 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-05 17:54:23 -06:00
Cody Yu	ef527be06c	[MISC] Use non-blocking transfer in prepare_input (#7172 )	2024-08-05 23:41:27 +00:00
Jacob Schein	89b8db6bb2	[Bugfix] Specify device when loading LoRA and embedding tensors (#7129 ) Co-authored-by: Jacob Schein <jacobschein@Jacobs-MacBook-Pro-2.local>	2024-08-05 16:35:47 -07:00
Thomas Parnell	789937af2e	[Doc] [SpecDecode] Update MLPSpeculator documentation (#7100 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-08-05 23:29:43 +00:00
youkaichao	dfb1a15dcb	[ci][frontend] deduplicate tests (#7101 )	2024-08-05 15:59:22 -07:00
Simon Mo	4db5176d97	bump version to v0.5.4 (#7139 )	2024-08-05 14:39:48 -07:00
Tyler Michael Smith	4cf1dc39be	[Bugfix][CI/Build] Fix CUTLASS FetchContent (#7171 )	2024-08-05 14:22:57 -07:00
Tyler Michael Smith	6e4852ce28	[CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001 )	2024-08-05 16:00:01 -04:00
Tyler Michael Smith	8571ac4672	[Kernel] Update CUTLASS to 3.5.1 (#7085 )	2024-08-05 15:13:43 -04:00
Rui Qiao	997cf78308	[Misc] Fix typo in GroupCoordinator.recv() (#7167 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-05 11:10:16 -07:00
Aditya Paliwal	57f560aa23	[BugFix] Use args.trust_remote_code (#7121 )	2024-08-05 09:26:14 -07:00
Nick Hill	003f8ee128	[BugFix] Use IP4 localhost form for zmq bind (#7163 )	2024-08-05 08:41:03 -07:00
Bongwon Jang	e9630458c7	[SpecDecode] Support FlashInfer in DraftModelRunner (#6926 )	2024-08-05 08:05:05 -07:00
Cade Daniel	82a1b1a82b	[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (#6963 )	2024-08-05 08:46:44 +00:00
Jungho Christopher Cho	c0d8f1636c	[Model] SiglipVisionModel ported from transformers (#6942 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-08-05 06:22:12 +00:00
Cyrus Leung	cc08fc7225	[Frontend] Reapply "Factor out code for running uvicorn" (#7095 )	2024-08-04 20:40:51 -07:00
Alphi	7b86e7c9cd	[Model] Add multi-image support for minicpmv (#7122 ) Co-authored-by: hezhihui <hzh7269@modelbest.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-05 09:23:17 +08:00
Jee Jee Li	f80ab3521c	Clean up remaining Punica C information (#7027 )	2024-08-04 15:37:08 -07:00
youkaichao	16a1cc9bb2	[misc][distributed] improve libcudart.so finding (#7127 )	2024-08-04 11:31:51 -07:00
Thomas Parnell	b1c9aa3daa	[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator (#7105 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-08-04 07:13:18 -07:00
Jee Jee Li	179a6a36f2	[Model]Refactor MiniCPMV (#7020 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 08:12:41 +00:00
youkaichao	83c644fe7e	[core][misc] simply output processing with shortcut code path (#7117 )	2024-08-04 00:22:19 -07:00
youkaichao	9fadc7b7a0	[misc] add zmq in collect env (#7119 )	2024-08-03 22:03:46 -07:00
Yihuan Bu	654bc5ca49	Support for guided decoding for offline LLM (#6878 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 03:12:09 +00:00
Jeff Fialho	825b044863	[Frontend] Warn if user `max_model_len` is greater than derived `max_model_len` (#7080 ) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-03 16:01:38 -07:00
youkaichao	44dcb52e39	[ci][test] finalize fork_new_process_for_each_test (#7114 )	2024-08-03 10:44:53 -07:00
Kuntai Du	67d745cc68	[CI] Temporarily turn off H100 performance benchmark (#7104 )	2024-08-02 23:52:44 -07:00
Jee Jee Li	99d7cabd7b	[LoRA] ReplicatedLinear support LoRA (#7081 )	2024-08-02 22:40:19 -07:00
Zach Zheng	fb2c1c86c1	[Bugfix] Fix block table for seqs that have prefix cache hits (#7018 )	2024-08-02 22:38:15 -07:00
Isotr0py	0c25435daa	[Model] Refactor and decouple weight loading logic for InternVL2 model (#7067 )	2024-08-02 22:36:14 -07:00
youkaichao	a0d164567c	[ci][distributed] disable ray dag tests (#7099 )	2024-08-02 22:32:04 -07:00
youkaichao	04e5583425	[ci][distributed] merge distributed test commands (#7097 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-02 21:33:53 -07:00
Cyrus Leung	8c025fa703	[Frontend] Factor out chat message parsing (#7055 )	2024-08-02 21:31:27 -07:00
youkaichao	69ea15e5cc	[ci][distributed] shorten wait time if server hangs (#7098 )	2024-08-02 21:05:16 -07:00
Robert Shaw	ed812a73fa	[ Frontend ] Multiprocessing for OpenAI Server with `zeromq` (#6883 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-08-02 18:27:28 -07:00
youkaichao	708989341e	[misc] add a flag to enable compile (#7092 )	2024-08-02 16:18:45 -07:00
Rui Qiao	22e718ff1a	[Misc] Revive to use loopback address for driver IP (#7091 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-02 15:50:00 -07:00
Rui Qiao	05308891e2	[Core] Pipeline parallel with Ray ADAG (#6837 ) Support pipeline-parallelism with Ray accelerated DAG. Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-02 13:55:40 -07:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
Michael Goin	b482b9a5b1	[CI/Build] Add support for Python 3.12 (#7035 )	2024-08-02 13:51:22 -07:00
youkaichao	806949514a	[ci] set timeout for test_oot_registration.py (#7082 )	2024-08-02 10:03:24 -07:00

1 2 3 4 5 ...

2219 Commits