squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	9389380015	[Doc] Move guide for multimodal model and other improvements (#6168 )	2024-07-06 17:18:59 +08:00
Roger Wang	175c43eca4	[Doc] Reorganize Supported Models by Type (#6167 )	2024-07-06 05:59:36 +00:00
Simon Mo	bc96d5c330	Move release wheel env var to Dockerfile instead (#6163 )	2024-07-05 17:19:53 -07:00
Simon Mo	f0250620dd	Fix release wheel build env var (#6162 )	2024-07-05 16:24:31 -07:00
Simon Mo	2de490d60f	Update wheel builds to strip debug (#6161 )	2024-07-05 14:51:25 -07:00
Simon Mo	79d406e918	[Docs] Fix readthedocs for tag build (#6158 )	2024-07-05 12:44:40 -07:00
Simon Mo	abad5746a7	bump version to v0.5.1 (#6157 )	2024-07-05 12:04:51 -07:00
JGSweets	e58294ddf2	[Bugfix] Add verbose error if scipy is missing for blocksparse attention (#5695 )	2024-07-05 10:41:01 -07:00
jvlunteren	f1e15da6fe	[Frontend] Continuous usage stats in OpenAI completion API (#5742 )	2024-07-05 10:37:09 -07:00
Christian Rohmann	0097bb1829	[Bugfix] Use templated datasource in grafana.json to allow automatic imports (#6136 ) Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>	2024-07-05 09:49:47 -07:00
Cyrus Leung	ea4b570483	[VLM] Cleanup validation and update docs (#6149 )	2024-07-05 05:49:38 +00:00
Roger Wang	a41357e941	[VLM] Improve consistency between feature size calculation and dummy data for profiling (#6146 )	2024-07-05 09:29:47 +08:00
Cyrus Leung	ae96ef8fbd	[VLM] Calculate maximum number of multi-modal tokens by model (#6121 )	2024-07-04 16:37:23 -07:00
Lily Liu	69ec3ca14c	[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-07-04 16:35:51 -07:00
Yuan	81d7a50f24	[Hardware][Intel CPU] Adding intel openmp tunings in Docker file (#6008 ) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>	2024-07-04 15:22:12 -07:00
youkaichao	27902d42be	[misc][doc] try to add warning for latest html (#5979 )	2024-07-04 09:57:09 -07:00
Gregory Shtrasberg	56b325e977	[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (#6043 ) Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>	2024-07-03 22:19:38 -07:00
Cyrus Leung	3dd507083f	[CI/Build] Cleanup VLM tests (#6107 )	2024-07-03 18:58:18 -07:00
Murali Andoorveedu	0ed646b7aa	[Distributed][Core] Support Py39 and Py38 for PP (#6120 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-03 17:52:29 -07:00
Travis Johnson	1dab9bc8a9	[Bugfix] set OMP_NUM_THREADS to 1 by default for multiprocessing (#6109 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-07-03 16:56:59 -07:00
youkaichao	3de6e6a30e	[core][distributed] support n layers % pp size != 0 (#6115 )	2024-07-03 16:40:31 -07:00
youkaichao	966fe72141	[doc][misc] bump up py version in installation doc (#6119 )	2024-07-03 15:52:04 -07:00
Robert Shaw	62963d129e	[ Misc ] Clean Up `CompressedTensorsW8A8` (#6113 )	2024-07-03 22:50:08 +00:00
xwjiang2010	d9e98f42e4	[vlm] Remove vision language config. (#6089 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-03 22:14:16 +00:00
youkaichao	3c6325f0fc	[core][distributed] custom allreduce when pp size > 1 (#6117 )	2024-07-03 14:41:32 -07:00
Michael Goin	47f0954af0	[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975 )	2024-07-03 17:38:00 +00:00
Roger Wang	7cd2ebb025	[Bugfix] Fix `compute_logits` in Jamba (#6093 )	2024-07-03 00:32:35 -07:00
Roger Wang	f1c78138aa	[Doc] Fix Mock Import (#6094 )	2024-07-03 00:13:56 -07:00
Roger Wang	3a86b54fb0	[VLM][Frontend] Proper Image Prompt Formatting from OpenAI API (#6091 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-02 23:41:23 -07:00
youkaichao	f666207161	[misc][distributed] error on invalid state (#6092 )	2024-07-02 23:37:29 -07:00
Nick Hill	d830656a97	[BugFix] Avoid unnecessary Ray import warnings (#6079 )	2024-07-03 14:09:40 +08:00
SangBin Cho	d18bab3587	[CI] Fix base url doesn't strip "/" (#6087 )	2024-07-02 21:31:25 -07:00
Cyrus Leung	9831aec49f	[Core] Dynamic image size support for VLMs (#5276 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-07-02 20:34:00 -07:00
youkaichao	482045ee77	[hardware][misc] introduce platform abstraction (#6080 )	2024-07-02 20:12:22 -07:00
Mor Zusman	9d6a8daa87	[Model] Jamba support (#4115 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Erez Schwartz <erezs@ai21.com> Co-authored-by: Mor Zusman <morz@ai21.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Tomer Asida <tomera@ai21.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 23:11:29 +00:00
Qubitium-ModelCloud	ee93f4f92a	[CORE] Quantized lm-head Framework (#4442 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: ZX <zx@lbx.dev>	2024-07-02 22:25:17 +00:00
Robert Shaw	7c008c51a9	[ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-02 21:54:35 +00:00
Robert Shaw	4d26d806e1	Update conftest.py (#6076 )	2024-07-02 20:14:22 +00:00
Murali Andoorveedu	c5832d2ae9	[Core] Pipeline Parallel Support (#4412 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 10:58:08 -07:00
Sirej Dua	15aba081f3	[Speculative Decoding] MLPSpeculator Tensor Parallel support (1/2) (#6050 ) Co-authored-by: Sirej Dua <sirej.dua@databricks.com> Co-authored-by: Sirej Dua <Sirej Dua>	2024-07-02 07:20:29 -07:00
Cyrus Leung	31354e563f	[Doc] Reinstate doc dependencies (#6061 )	2024-07-02 10:53:16 +00:00
xwjiang2010	98d6682cd1	[VLM] Remove `image_input_type` from VLM config (#5852 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-02 07:57:09 +00:00
danieljannai21	2c37540aa6	[Frontend] Add template related params to request (#5709 )	2024-07-01 23:01:57 -07:00
Alexander Matveev	3476ed0809	[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602 )	2024-07-01 20:10:37 -07:00
Thomas Parnell	54600709b6	[Model] Changes to MLPSpeculator to support tie_weights and input_scale (#5965 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>	2024-07-01 16:40:02 -07:00
James Whedbee	e373853e12	[Frontend] Relax api url assertion for openai benchmarking (#6046 )	2024-07-01 23:39:10 +00:00
Nick Hill	c87ebc3ef9	[BugFix] Ensure worker model loop is always stopped at the right time (#5987 )	2024-07-01 16:17:58 -07:00
Antoni Baum	c4059ea54f	[Bugfix] Add explicit `end_forward` calls to flashinfer (#6044 )	2024-07-01 23:08:58 +00:00
Roger Wang	8e0817c262	[Bugfix][Doc] Fix Doc Formatting (#6048 )	2024-07-01 15:09:11 -07:00
ning.zhang	83bdcb6ac3	add FAQ doc under 'serving' (#5946 )	2024-07-01 14:11:36 -07:00

... 2 3 4 5 6 ...

1981 Commits