squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	d80aef3776	[Docs] Clean up latest news (#6401 )	2024-07-12 19:36:53 -07:00
Thomas Parnell	e1684a766a	[Bugfix] Fix hard-coded value of x in context_attention_fwd (#6373 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-12 18:30:54 -07:00
Saliya Ekanayake	a27f87da34	[Doc] Fix Typo in Doc (#6392 ) Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>	2024-07-13 00:48:23 +00:00
Kevin H. Luu	16ff6bd58c	[ci] Fix wording for GH bot (#6398 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-12 16:34:37 -07:00
Woosuk Kwon	f8f9ff57ee	[Bugfix][TPU] Fix megacore setting for v5e-litepod (#6397 )	2024-07-12 15:59:47 -07:00
Simon Mo	6bc9710f6e	Fix release pipeline's dir permission (#6391 )	2024-07-12 15:52:43 -07:00
Michael Goin	111fc6e7ec	[Misc] Add generated git commit hash as `vllm.__commit__` (#6386 )	2024-07-12 22:52:15 +00:00
Cody Yu	75f64d8b94	[Bugfix] Fix illegal memory access in FP8 MoE kernel (#6382 )	2024-07-12 21:33:33 +00:00
Simon Mo	21b2dcedab	Fix release pipeline's -e flag (#6390 )	2024-07-12 14:08:04 -07:00
Simon Mo	07b35af86d	Fix interpolation in release pipeline (#6389 )	2024-07-12 14:03:39 -07:00
Simon Mo	bb1a784b05	Fix release-pipeline.yaml (#6388 )	2024-07-12 14:00:57 -07:00
Simon Mo	d719ba24c5	Build some nightly wheels by default (#6380 )	2024-07-12 13:56:59 -07:00
Cody Yu	aa48e502fb	[MISC] Upgrade dependency to PyTorch 2.3.1 (#5327 )	2024-07-12 12:04:26 -07:00
Kevin H. Luu	4dbebd03cc	[ci] Add GHA workflows to enable full CI run (#6381 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-12 11:36:26 -07:00
Kevin H. Luu	b75bce1008	[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-12 09:58:38 -07:00
Yihuan Bu	b039cbbce3	[Misc] add fixture to guided processor tests (#6341 )	2024-07-12 09:55:39 -07:00
Alexei-V-Ivanov-AMD	f9d25c2519	[Build/CI] Checking/Waiting for the GPU's clean state (#6379 )	2024-07-12 09:42:24 -07:00
Cyrus Leung	024ad87cdc	[Bugfix] Fix dtype mismatch in PaliGemma (#6367 )	2024-07-12 08:22:18 -07:00
Robert Shaw	aea19f0989	[ Misc ] Support Models With Bias in `compressed-tensors` integration (#6356 )	2024-07-12 11:11:29 -04:00
Roger Wang	f7160d946a	[Misc][Bugfix] Update transformers for tokenizer issue (#6364 )	2024-07-12 08:40:07 +00:00
Robert Shaw	6047187cd8	[ Misc ] Remove separate bias add (#6353 )	2024-07-12 05:06:09 +00:00
Hongxia Yang	b6c16cf8ff	[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352 )	2024-07-11 21:30:46 -07:00
adityagoel14	d26a8b3f1f	[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub (#6350 )	2024-07-11 21:26:26 -07:00
Michael Goin	d59eb98489	[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343 )	2024-07-12 10:47:17 +08:00
Helena Kloosterman	adf32e0a0f	[Bugfix] Fix usage stats logging exception warning with OpenVINO (#6349 )	2024-07-12 10:47:00 +08:00
youkaichao	2b0fb53481	[distributed][misc] be consistent with pytorch for libcudart.so (#6346 ) [distributed][misc] keep consistent with how pytorch finds libcudart.so (#6346)	2024-07-11 19:35:17 -07:00
Lily Liu	d6ab528997	[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351 )	2024-07-12 01:32:06 +00:00
Robert Shaw	7ed6a4f0e1	[ BugFix ] Prompt Logprobs Detokenization (#6223 ) Co-authored-by: Zifei Tong <zifeitong@gmail.com>	2024-07-11 22:02:29 +00:00
Kuntai Du	a4feba929b	[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362 )	2024-07-11 13:28:38 -07:00
youkaichao	2d23b42d92	[doc] update pipeline parallel in readme (#6347 )	2024-07-11 11:38:40 -07:00
xwjiang2010	1df43de9bb	[bug fix] Fix llava next feature size calculation. (#6339 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-07-11 17:21:10 +00:00
Simon Mo	52b7fcb35a	Benchmark: add H100 suite (#6047 )	2024-07-11 09:17:07 -07:00
Robert Shaw	b675069d74	[ Misc ] Refactor Marlin Python Utilities (#6082 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-07-11 15:40:11 +00:00
Mor Zusman	55f692b46e	[BugFix] get_and_reset only when scheduler outputs are not empty (#6266 )	2024-07-11 07:40:20 -07:00
Thomas Parnell	8a1415cf77	[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules. (#6326 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-07-11 07:05:59 -07:00
pushan	546b101fa0	[BugFix]: fix engine timeout due to request abort (#6255 ) Signed-off-by: yatta zhang <ytzhang01@foxmail.com> Signed-off-by: zhangyuntao.dev <zhangyuntao.dev@bytedance.com> Co-authored-by: zhangyuntao.dev <zhangyuntao.dev@bytedance.com>	2024-07-11 06:46:31 -07:00
aniaan	3963a5335b	[Misc] refactor(config): clean up unused code (#6320 )	2024-07-11 09:39:07 +00:00
Roger Wang	c4774eb841	[Bugfix] Fix snapshot download in serving benchmark (#6318 )	2024-07-11 07:04:05 +00:00
Lim Xiang Yang	fc17110bbe	[BugFix]: set outlines pkg version (#6262 )	2024-07-11 04:37:11 +00:00
Jie Fu (傅杰)	439c84581a	[Doc] Update description of vLLM support for CPUs (#6003 )	2024-07-10 21:15:29 -07:00
daquexian	99ded1e1c4	[Doc] Remove comments incorrectly copied from another project (#6286 )	2024-07-10 17:05:26 -07:00
Woosuk Kwon	997df46a32	[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (#6313 )	2024-07-10 16:39:02 -07:00
sroy745	ae151d73be	[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765 )	2024-07-10 16:02:47 -07:00
sangjune.park	44cc76610d	[Bugfix] Fix OpenVINOExecutor abstractmethod error (#6296 ) Signed-off-by: sangjune.park <sangjune.park@navercorp.com>	2024-07-10 10:03:32 -07:00
Benjamin Muskalla	b422d4961a	[CI/Build] Enable mypy typing for remaining folders (#6268 )	2024-07-10 22:15:55 +08:00
Thomas Parnell	c38eba3046	[Bugfix] MLPSpeculator: Use ParallelLMHead in tie_weights=False case. (#6303 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-10 09:04:07 -04:00
Woosuk Kwon	e72ae80b06	[Bugfix] Support 2D input shape in MoE layer (#6287 )	2024-07-10 09:03:16 -04:00
Cyrus Leung	8a924d2248	[Doc] Guide for adding multi-modal plugins (#6205 )	2024-07-10 14:55:34 +08:00
Woosuk Kwon	5ed3505d82	[Bugfix][TPU] Add prompt adapter methods to TPUExecutor (#6279 )	2024-07-09 19:30:56 -07:00
youkaichao	da78caecfa	[core][distributed] zmq fallback for broadcasting large objects (#6183 ) [core][distributed] add zmq fallback for broadcasting large objects (#6183)	2024-07-09 18:49:11 -07:00

... 2 3 4 5 6 ...

2051 Commits