squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jungho Christopher Cho	c0d8f1636c	[Model] SiglipVisionModel ported from transformers (#6942 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-08-05 06:22:12 +00:00
Cyrus Leung	cc08fc7225	[Frontend] Reapply "Factor out code for running uvicorn" (#7095 )	2024-08-04 20:40:51 -07:00
Alphi	7b86e7c9cd	[Model] Add multi-image support for minicpmv (#7122 ) Co-authored-by: hezhihui <hzh7269@modelbest.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-05 09:23:17 +08:00
Jee Jee Li	f80ab3521c	Clean up remaining Punica C information (#7027 )	2024-08-04 15:37:08 -07:00
youkaichao	16a1cc9bb2	[misc][distributed] improve libcudart.so finding (#7127 )	2024-08-04 11:31:51 -07:00
Thomas Parnell	b1c9aa3daa	[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator (#7105 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-08-04 07:13:18 -07:00
Jee Jee Li	179a6a36f2	[Model]Refactor MiniCPMV (#7020 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 08:12:41 +00:00
youkaichao	83c644fe7e	[core][misc] simply output processing with shortcut code path (#7117 )	2024-08-04 00:22:19 -07:00
youkaichao	9fadc7b7a0	[misc] add zmq in collect env (#7119 )	2024-08-03 22:03:46 -07:00
Yihuan Bu	654bc5ca49	Support for guided decoding for offline LLM (#6878 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 03:12:09 +00:00
Jeff Fialho	825b044863	[Frontend] Warn if user `max_model_len` is greater than derived `max_model_len` (#7080 ) Signed-off-by: Jefferson Fialho <jfialho@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-03 16:01:38 -07:00
youkaichao	44dcb52e39	[ci][test] finalize fork_new_process_for_each_test (#7114 )	2024-08-03 10:44:53 -07:00
Kuntai Du	67d745cc68	[CI] Temporarily turn off H100 performance benchmark (#7104 )	2024-08-02 23:52:44 -07:00
Jee Jee Li	99d7cabd7b	[LoRA] ReplicatedLinear support LoRA (#7081 )	2024-08-02 22:40:19 -07:00
Zach Zheng	fb2c1c86c1	[Bugfix] Fix block table for seqs that have prefix cache hits (#7018 )	2024-08-02 22:38:15 -07:00
Isotr0py	0c25435daa	[Model] Refactor and decouple weight loading logic for InternVL2 model (#7067 )	2024-08-02 22:36:14 -07:00
youkaichao	a0d164567c	[ci][distributed] disable ray dag tests (#7099 )	2024-08-02 22:32:04 -07:00
youkaichao	04e5583425	[ci][distributed] merge distributed test commands (#7097 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-02 21:33:53 -07:00
Cyrus Leung	8c025fa703	[Frontend] Factor out chat message parsing (#7055 )	2024-08-02 21:31:27 -07:00
youkaichao	69ea15e5cc	[ci][distributed] shorten wait time if server hangs (#7098 )	2024-08-02 21:05:16 -07:00
Robert Shaw	ed812a73fa	[ Frontend ] Multiprocessing for OpenAI Server with `zeromq` (#6883 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-08-02 18:27:28 -07:00
youkaichao	708989341e	[misc] add a flag to enable compile (#7092 )	2024-08-02 16:18:45 -07:00
Rui Qiao	22e718ff1a	[Misc] Revive to use loopback address for driver IP (#7091 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-02 15:50:00 -07:00
Rui Qiao	05308891e2	[Core] Pipeline parallel with Ray ADAG (#6837 ) Support pipeline-parallelism with Ray accelerated DAG. Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-02 13:55:40 -07:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
Michael Goin	b482b9a5b1	[CI/Build] Add support for Python 3.12 (#7035 )	2024-08-02 13:51:22 -07:00
youkaichao	806949514a	[ci] set timeout for test_oot_registration.py (#7082 )	2024-08-02 10:03:24 -07:00
Jie Fu (傅杰)	c16eaac500	[Hardware][Intel CPU] Update torch 2.4.0 for CPU backend (#6931 )	2024-08-02 08:55:58 -07:00
Peng Guanwen	db35186391	[Core] Comment out unused code in sampler (#7023 )	2024-08-02 00:58:26 -07:00
youkaichao	660dea1235	[cuda][misc] remove error_on_invalid_device_count_status (#7069 )	2024-08-02 00:14:21 -07:00
Bongwon Jang	cf2a1a4d9d	Fix tracing.py (#7065 )	2024-08-01 23:28:00 -07:00
youkaichao	252357793d	[ci][distributed] try to fix pp test (#7054 )	2024-08-01 22:03:12 -07:00
Cyrus Leung	3bb4b1e4cd	[mypy] Speed up mypy checking (#7056 )	2024-08-01 19:49:43 -07:00
Lily Liu	954f7305a1	[Kernel] Fix input for flashinfer prefill wrapper. (#7008 )	2024-08-01 18:44:16 -07:00
Woosuk Kwon	6ce01f3066	[Performance] Optimize `get_seqs` (#7051 )	2024-08-01 18:29:52 -07:00
Tyler Michael Smith	6a11fdfbb8	[CI/Build][Bugfix] Fix CUTLASS header-only line (#7034 )	2024-08-01 13:51:15 -07:00
Woosuk Kwon	805a8a75f2	[Misc] Support attention logits soft-capping with flash-attn (#7022 )	2024-08-01 13:14:37 -07:00
omkar kakarparthi	562e580abc	Update run-amd-test.sh (#7044 )	2024-08-01 13:12:37 -07:00
Murali Andoorveedu	fc912e0886	[Models] Support Qwen model with PP (#6974 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-08-01 12:40:43 -07:00
Michael Goin	f4fd390f5d	[Bugfix] Lower gemma's unloaded_params exception to warning (#7002 )	2024-08-01 12:01:07 -07:00
Michael Goin	fb3db61688	[CI/Build] Remove sparseml requirement from testing (#7037 )	2024-08-01 12:00:51 -07:00
Isotr0py	2dd34371a6	[Bugfix] Fix RMSNorm forward in InternViT attention qk_layernorm (#6992 )	2024-08-01 12:00:28 -07:00
Sage Moore	7e0861bd0b	[CI/Build] Update PyTorch to 2.4.0 (#6951 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-01 11:11:24 -07:00
Alexei-V-Ivanov-AMD	a72a424b3e	[Build/CI] Fixing Docker Hub quota issue. (#7043 )	2024-08-01 11:07:37 -07:00
youkaichao	c8a7e93273	[core][scheduler] simplify and improve scheduler (#6867 )	2024-07-31 23:51:09 -07:00
zifeitong	3c10591ef2	[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954 )	2024-07-31 21:13:34 -07:00
Aurick Qiao	0437492ea9	PP comm optimization: replace send with partial send + allgather (#6695 ) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>	2024-07-31 20:15:42 -07:00
Travis Johnson	630dd9e0ae	[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-07-31 19:49:11 -07:00
Woosuk Kwon	23993a7997	[Bugfix][TPU] Do not use torch.Generator for TPUs (#6981 )	2024-07-31 18:50:28 -07:00
xuyi	1d2e7fb73f	[Model] Pipeline parallel support for Qwen2 (#6924 )	2024-07-31 18:49:51 -07:00

1 2 3 4 5 ...

2196 Commits