Commit Graph

1981 Commits

Author SHA1 Message Date
Woosuk Kwon
3dee97b05f
[Docs] Add Google Cloud to sponsor list (#6450) 2024-07-15 11:58:10 -07:00
youkaichao
4cf256ae7f
[misc][distributed] fix pp missing layer condition (#6446) 2024-07-15 10:32:35 -07:00
Simon Mo
64fdc08c72
bump version to v0.5.2 (#6433) 2024-07-15 17:27:40 +00:00
Thomas Parnell
4ef95b0f06
[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (#6409)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-07-15 13:14:49 -04:00
Thomas Parnell
eaec4b9153
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
2024-07-15 10:12:47 -07:00
Pernekhan Utemuratov
a63a4c6341
[Misc] Use 0.0.9 version for flashinfer (#6447)
Co-authored-by: Pernekhan Utemuratov <pernekhan@deepinfra.com>
2024-07-15 10:10:26 -07:00
Tyler Michael Smith
c8fd97f26d
[Kernel] Use CUTLASS kernels for the FP8 layers with Bias (#6270) 2024-07-15 13:05:52 -04:00
youkaichao
94b82e8c18
[doc][distributed] add suggestion for distributed inference (#6418) 2024-07-15 09:45:51 -07:00
Roger Wang
6ae1597ddf
[VLM] Minor space optimization for ClipVisionModel (#6436) 2024-07-15 17:29:51 +08:00
youkaichao
22e79ee8f3
[doc][misc] doc update (#6439) 2024-07-14 23:33:25 -07:00
Cyrus Leung
de19916314
[Bugfix] Convert image to RGB by default (#6430) 2024-07-15 05:39:15 +00:00
youkaichao
69672f116c
[core][distributed] simplify code to support pipeline parallel (#6406) 2024-07-14 21:20:51 -07:00
DefTruth
44874a0bf9
[Doc] add env docs for flashinfer backend (#6437) 2024-07-14 21:16:51 -07:00
zifeitong
b47008b4d2
[BugFix] BatchResponseData body should be optional (#6345)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-15 04:06:09 +00:00
Simon Mo
9bfece89fd
Add FUNDING.yml (#6435) 2024-07-14 20:36:16 -07:00
Simon Mo
32c9d7f765
Report usage for beam search (#6404) 2024-07-14 19:37:35 -07:00
Fish
ccb20db8bd
[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests' (#6428) 2024-07-14 19:27:01 -07:00
Robert Shaw
a754dc2cb9
[CI/Build] Cross python wheel (#6394) 2024-07-14 18:54:46 -07:00
Robert Cohn
61e85dbad8
[Doc] xpu backend requires running setvars.sh (#6393) 2024-07-14 17:10:11 -07:00
Ethan Xu
dbfe254eda
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-07-14 15:36:43 -07:00
Robert Shaw
73030b7dae
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423) 2024-07-14 21:38:42 +00:00
youkaichao
ccd3c04571
[ci][build] fix commit id (#6420)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-07-14 22:16:21 +08:00
Tyler Michael Smith
9dad5cc859
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384) 2024-07-14 13:37:19 +00:00
Yuan Tang
6ef3bf912c
Remove unnecessary trailing period in spec_decode.rst (#6405) 2024-07-14 07:58:09 +00:00
Isotr0py
540c0368b1
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-14 05:27:14 +00:00
Robert Shaw
fb6af8bc08
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417) 2024-07-13 20:03:58 -07:00
Woosuk Kwon
eeceadaecc
[Misc] Add deprecation warning for beam search (#6402) 2024-07-13 11:52:22 -07:00
Robert Shaw
babf52dade
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-07-13 10:21:37 +00:00
Noam Gat
9da4aad44b
Updating LM Format Enforcer version to v10.3 (#6411) 2024-07-13 10:09:12 +00:00
youkaichao
41708e5034
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
Woosuk Kwon
d80aef3776
[Docs] Clean up latest news (#6401) 2024-07-12 19:36:53 -07:00
Thomas Parnell
e1684a766a
[Bugfix] Fix hard-coded value of x in context_attention_fwd (#6373)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-07-12 18:30:54 -07:00
Saliya Ekanayake
a27f87da34
[Doc] Fix Typo in Doc (#6392)
Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>
2024-07-13 00:48:23 +00:00
Kevin H. Luu
16ff6bd58c
[ci] Fix wording for GH bot (#6398)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-12 16:34:37 -07:00
Woosuk Kwon
f8f9ff57ee
[Bugfix][TPU] Fix megacore setting for v5e-litepod (#6397) 2024-07-12 15:59:47 -07:00
Simon Mo
6bc9710f6e
Fix release pipeline's dir permission (#6391) 2024-07-12 15:52:43 -07:00
Michael Goin
111fc6e7ec
[Misc] Add generated git commit hash as vllm.__commit__ (#6386) 2024-07-12 22:52:15 +00:00
Cody Yu
75f64d8b94
[Bugfix] Fix illegal memory access in FP8 MoE kernel (#6382) 2024-07-12 21:33:33 +00:00
Simon Mo
21b2dcedab
Fix release pipeline's -e flag (#6390) 2024-07-12 14:08:04 -07:00
Simon Mo
07b35af86d
Fix interpolation in release pipeline (#6389) 2024-07-12 14:03:39 -07:00
Simon Mo
bb1a784b05
Fix release-pipeline.yaml (#6388) 2024-07-12 14:00:57 -07:00
Simon Mo
d719ba24c5
Build some nightly wheels by default (#6380) 2024-07-12 13:56:59 -07:00
Cody Yu
aa48e502fb
[MISC] Upgrade dependency to PyTorch 2.3.1 (#5327) 2024-07-12 12:04:26 -07:00
Kevin H. Luu
4dbebd03cc
[ci] Add GHA workflows to enable full CI run (#6381)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-12 11:36:26 -07:00
Kevin H. Luu
b75bce1008
[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-12 09:58:38 -07:00
Yihuan Bu
b039cbbce3
[Misc] add fixture to guided processor tests (#6341) 2024-07-12 09:55:39 -07:00
Alexei-V-Ivanov-AMD
f9d25c2519
[Build/CI] Checking/Waiting for the GPU's clean state (#6379) 2024-07-12 09:42:24 -07:00
Cyrus Leung
024ad87cdc
[Bugfix] Fix dtype mismatch in PaliGemma (#6367) 2024-07-12 08:22:18 -07:00
Robert Shaw
aea19f0989
[ Misc ] Support Models With Bias in compressed-tensors integration (#6356) 2024-07-12 11:11:29 -04:00
Roger Wang
f7160d946a
[Misc][Bugfix] Update transformers for tokenizer issue (#6364) 2024-07-12 08:40:07 +00:00