squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Simon Mo	bb1a784b05	Fix release-pipeline.yaml (#6388 )	2024-07-12 14:00:57 -07:00
Simon Mo	d719ba24c5	Build some nightly wheels by default (#6380 )	2024-07-12 13:56:59 -07:00
Cody Yu	aa48e502fb	[MISC] Upgrade dependency to PyTorch 2.3.1 (#5327 )	2024-07-12 12:04:26 -07:00
Kevin H. Luu	4dbebd03cc	[ci] Add GHA workflows to enable full CI run (#6381 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-12 11:36:26 -07:00
Kevin H. Luu	b75bce1008	[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-12 09:58:38 -07:00
Yihuan Bu	b039cbbce3	[Misc] add fixture to guided processor tests (#6341 )	2024-07-12 09:55:39 -07:00
Alexei-V-Ivanov-AMD	f9d25c2519	[Build/CI] Checking/Waiting for the GPU's clean state (#6379 )	2024-07-12 09:42:24 -07:00
Cyrus Leung	024ad87cdc	[Bugfix] Fix dtype mismatch in PaliGemma (#6367 )	2024-07-12 08:22:18 -07:00
Robert Shaw	aea19f0989	[ Misc ] Support Models With Bias in `compressed-tensors` integration (#6356 )	2024-07-12 11:11:29 -04:00
Roger Wang	f7160d946a	[Misc][Bugfix] Update transformers for tokenizer issue (#6364 )	2024-07-12 08:40:07 +00:00
Robert Shaw	6047187cd8	[ Misc ] Remove separate bias add (#6353 )	2024-07-12 05:06:09 +00:00
Hongxia Yang	b6c16cf8ff	[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352 )	2024-07-11 21:30:46 -07:00
adityagoel14	d26a8b3f1f	[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub (#6350 )	2024-07-11 21:26:26 -07:00
Michael Goin	d59eb98489	[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343 )	2024-07-12 10:47:17 +08:00
Helena Kloosterman	adf32e0a0f	[Bugfix] Fix usage stats logging exception warning with OpenVINO (#6349 )	2024-07-12 10:47:00 +08:00
youkaichao	2b0fb53481	[distributed][misc] be consistent with pytorch for libcudart.so (#6346 ) [distributed][misc] keep consistent with how pytorch finds libcudart.so (#6346)	2024-07-11 19:35:17 -07:00
Lily Liu	d6ab528997	[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351 )	2024-07-12 01:32:06 +00:00
Robert Shaw	7ed6a4f0e1	[ BugFix ] Prompt Logprobs Detokenization (#6223 ) Co-authored-by: Zifei Tong <zifeitong@gmail.com>	2024-07-11 22:02:29 +00:00
Kuntai Du	a4feba929b	[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362 )	2024-07-11 13:28:38 -07:00
youkaichao	2d23b42d92	[doc] update pipeline parallel in readme (#6347 )	2024-07-11 11:38:40 -07:00
xwjiang2010	1df43de9bb	[bug fix] Fix llava next feature size calculation. (#6339 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-07-11 17:21:10 +00:00
Simon Mo	52b7fcb35a	Benchmark: add H100 suite (#6047 )	2024-07-11 09:17:07 -07:00
Robert Shaw	b675069d74	[ Misc ] Refactor Marlin Python Utilities (#6082 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-07-11 15:40:11 +00:00
Mor Zusman	55f692b46e	[BugFix] get_and_reset only when scheduler outputs are not empty (#6266 )	2024-07-11 07:40:20 -07:00
Thomas Parnell	8a1415cf77	[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules. (#6326 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-07-11 07:05:59 -07:00
pushan	546b101fa0	[BugFix]: fix engine timeout due to request abort (#6255 ) Signed-off-by: yatta zhang <ytzhang01@foxmail.com> Signed-off-by: zhangyuntao.dev <zhangyuntao.dev@bytedance.com> Co-authored-by: zhangyuntao.dev <zhangyuntao.dev@bytedance.com>	2024-07-11 06:46:31 -07:00
aniaan	3963a5335b	[Misc] refactor(config): clean up unused code (#6320 )	2024-07-11 09:39:07 +00:00
Roger Wang	c4774eb841	[Bugfix] Fix snapshot download in serving benchmark (#6318 )	2024-07-11 07:04:05 +00:00
Lim Xiang Yang	fc17110bbe	[BugFix]: set outlines pkg version (#6262 )	2024-07-11 04:37:11 +00:00
Jie Fu (傅杰)	439c84581a	[Doc] Update description of vLLM support for CPUs (#6003 )	2024-07-10 21:15:29 -07:00
daquexian	99ded1e1c4	[Doc] Remove comments incorrectly copied from another project (#6286 )	2024-07-10 17:05:26 -07:00
Woosuk Kwon	997df46a32	[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (#6313 )	2024-07-10 16:39:02 -07:00
sroy745	ae151d73be	[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765 )	2024-07-10 16:02:47 -07:00
sangjune.park	44cc76610d	[Bugfix] Fix OpenVINOExecutor abstractmethod error (#6296 ) Signed-off-by: sangjune.park <sangjune.park@navercorp.com>	2024-07-10 10:03:32 -07:00
Benjamin Muskalla	b422d4961a	[CI/Build] Enable mypy typing for remaining folders (#6268 )	2024-07-10 22:15:55 +08:00
Thomas Parnell	c38eba3046	[Bugfix] MLPSpeculator: Use ParallelLMHead in tie_weights=False case. (#6303 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-10 09:04:07 -04:00
Woosuk Kwon	e72ae80b06	[Bugfix] Support 2D input shape in MoE layer (#6287 )	2024-07-10 09:03:16 -04:00
Cyrus Leung	8a924d2248	[Doc] Guide for adding multi-modal plugins (#6205 )	2024-07-10 14:55:34 +08:00
Woosuk Kwon	5ed3505d82	[Bugfix][TPU] Add prompt adapter methods to TPUExecutor (#6279 )	2024-07-09 19:30:56 -07:00
youkaichao	da78caecfa	[core][distributed] zmq fallback for broadcasting large objects (#6183 ) [core][distributed] add zmq fallback for broadcasting large objects (#6183)	2024-07-09 18:49:11 -07:00
Abhinav Goyal	2416b26e11	[Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978 )	2024-07-09 18:34:02 -07:00
Baoyuan Qi	d3a245138a	[Bugfix]fix and needs_scalar_to_array logic check (#6238 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-07-09 23:43:24 +00:00
Murali Andoorveedu	673dd4cae9	[Docs] Docs update for Pipeline Parallel (#6222 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-07-09 16:24:58 -07:00
Swapnil Parekh	4d6ada947c	[CORE] Adding support for insertion of soft-tuned prompts (#4645 ) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com> Co-authored-by: Joe G <joseph.granados@h2o.ai> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-07-09 13:26:36 -07:00
Kevin H. Luu	a0550cbc80	Add support for multi-node on CI (#5955 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-09 12:56:56 -07:00
Woosuk Kwon	08c5bdecae	[Bugfix][TPU] Fix outlines installation in TPU Dockerfile (#6256 )	2024-07-09 02:56:06 -07:00
Woosuk Kwon	5d5b4c5fe5	[Bugfix][TPU] Add missing None to model input (#6245 )	2024-07-09 00:21:37 -07:00
youkaichao	70c232f85a	[core][distributed] fix ray worker rank assignment (#6235 )	2024-07-08 21:31:44 -07:00
youkaichao	a3c9435d93	[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability (#6216 )	2024-07-08 20:02:15 -07:00
Simon Mo	4f0e0ea131	Add FlashInfer to default Dockerfile (#6172 )	2024-07-08 13:38:03 -07:00

1 2 3 4 5 ...

1891 Commits