Woosuk Kwon
|
c467dff24f
|
[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457)
|
2024-07-16 09:56:28 -07:00 |
|
youkaichao
|
9f4ccec761
|
[doc][misc] remind to cancel debugging environment variables (#6481)
[doc][misc] remind users to cancel debugging environment variables after debugging (#6481)
|
2024-07-16 09:45:30 -07:00 |
|
Cyrus Leung
|
38ef94888a
|
[CI/Build] Remove "boardwalk" image asset (#6460)
|
2024-07-16 08:59:36 -07:00 |
|
Peng Guanwen
|
2bb0489cb3
|
[Core] Use numpy to speed up padded token processing (#6442)
|
2024-07-16 08:13:25 -07:00 |
|
Thomas Parnell
|
7508a3dc34
|
[Misc] Fix typos in spec. decode metrics logging. (#6470)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-16 13:55:15 +00:00 |
|
sasha0552
|
7a3d2a5b95
|
[Frontend] Support for chat completions input in the tokenize endpoint (#5923)
|
2024-07-16 20:18:09 +08:00 |
|
Cyrus Leung
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
Woosuk Kwon
|
37d776606f
|
[Docs] Announce 5th meetup (#6458)
|
2024-07-15 21:04:58 -07:00 |
|
Joe
|
d92b3c5cde
|
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#6419)
|
2024-07-15 18:54:15 -07:00 |
|
Mor Zusman
|
9ad32dacd9
|
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug (#6425)
Co-authored-by: Mor Zusman <morz@ai21.com>
|
2024-07-16 01:32:55 +00:00 |
|
Kevin H. Luu
|
d6f3b3d5c4
|
Pin sphinx-argparse version (#6453)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-16 01:26:11 +00:00 |
|
Woosuk Kwon
|
4552e37b55
|
[CI/Build][TPU] Add TPU CI test (#6277)
Co-authored-by: kevin <kevin@anyscale.com>
|
2024-07-15 14:31:16 -07:00 |
|
Woosuk Kwon
|
ec9933f4a5
|
[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (#6289)
|
2024-07-15 19:02:14 +00:00 |
|
Woosuk Kwon
|
3dee97b05f
|
[Docs] Add Google Cloud to sponsor list (#6450)
|
2024-07-15 11:58:10 -07:00 |
|
youkaichao
|
4cf256ae7f
|
[misc][distributed] fix pp missing layer condition (#6446)
|
2024-07-15 10:32:35 -07:00 |
|
Simon Mo
|
64fdc08c72
|
bump version to v0.5.2 (#6433)
|
2024-07-15 17:27:40 +00:00 |
|
Thomas Parnell
|
4ef95b0f06
|
[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (#6409)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-15 13:14:49 -04:00 |
|
Thomas Parnell
|
eaec4b9153
|
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
|
2024-07-15 10:12:47 -07:00 |
|
Pernekhan Utemuratov
|
a63a4c6341
|
[Misc] Use 0.0.9 version for flashinfer (#6447)
Co-authored-by: Pernekhan Utemuratov <pernekhan@deepinfra.com>
|
2024-07-15 10:10:26 -07:00 |
|
Tyler Michael Smith
|
c8fd97f26d
|
[Kernel] Use CUTLASS kernels for the FP8 layers with Bias (#6270)
|
2024-07-15 13:05:52 -04:00 |
|
youkaichao
|
94b82e8c18
|
[doc][distributed] add suggestion for distributed inference (#6418)
|
2024-07-15 09:45:51 -07:00 |
|
Roger Wang
|
6ae1597ddf
|
[VLM] Minor space optimization for ClipVisionModel (#6436)
|
2024-07-15 17:29:51 +08:00 |
|
youkaichao
|
22e79ee8f3
|
[doc][misc] doc update (#6439)
|
2024-07-14 23:33:25 -07:00 |
|
Cyrus Leung
|
de19916314
|
[Bugfix] Convert image to RGB by default (#6430)
|
2024-07-15 05:39:15 +00:00 |
|
youkaichao
|
69672f116c
|
[core][distributed] simplify code to support pipeline parallel (#6406)
|
2024-07-14 21:20:51 -07:00 |
|
DefTruth
|
44874a0bf9
|
[Doc] add env docs for flashinfer backend (#6437)
|
2024-07-14 21:16:51 -07:00 |
|
zifeitong
|
b47008b4d2
|
[BugFix] BatchResponseData body should be optional (#6345)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-15 04:06:09 +00:00 |
|
Simon Mo
|
9bfece89fd
|
Add FUNDING.yml (#6435)
|
2024-07-14 20:36:16 -07:00 |
|
Simon Mo
|
32c9d7f765
|
Report usage for beam search (#6404)
|
2024-07-14 19:37:35 -07:00 |
|
Fish
|
ccb20db8bd
|
[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests' (#6428)
|
2024-07-14 19:27:01 -07:00 |
|
Robert Shaw
|
a754dc2cb9
|
[CI/Build] Cross python wheel (#6394)
|
2024-07-14 18:54:46 -07:00 |
|
Robert Cohn
|
61e85dbad8
|
[Doc] xpu backend requires running setvars.sh (#6393)
|
2024-07-14 17:10:11 -07:00 |
|
Ethan Xu
|
dbfe254eda
|
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-07-14 15:36:43 -07:00 |
|
Robert Shaw
|
73030b7dae
|
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423)
|
2024-07-14 21:38:42 +00:00 |
|
youkaichao
|
ccd3c04571
|
[ci][build] fix commit id (#6420)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-07-14 22:16:21 +08:00 |
|
Tyler Michael Smith
|
9dad5cc859
|
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384)
|
2024-07-14 13:37:19 +00:00 |
|
Yuan Tang
|
6ef3bf912c
|
Remove unnecessary trailing period in spec_decode.rst (#6405)
|
2024-07-14 07:58:09 +00:00 |
|
Isotr0py
|
540c0368b1
|
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-14 05:27:14 +00:00 |
|
Robert Shaw
|
fb6af8bc08
|
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417)
|
2024-07-13 20:03:58 -07:00 |
|
Woosuk Kwon
|
eeceadaecc
|
[Misc] Add deprecation warning for beam search (#6402)
|
2024-07-13 11:52:22 -07:00 |
|
Robert Shaw
|
babf52dade
|
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-13 10:21:37 +00:00 |
|
Noam Gat
|
9da4aad44b
|
Updating LM Format Enforcer version to v10.3 (#6411)
|
2024-07-13 10:09:12 +00:00 |
|
youkaichao
|
41708e5034
|
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-12 21:51:48 -07:00 |
|
Woosuk Kwon
|
d80aef3776
|
[Docs] Clean up latest news (#6401)
|
2024-07-12 19:36:53 -07:00 |
|
Thomas Parnell
|
e1684a766a
|
[Bugfix] Fix hard-coded value of x in context_attention_fwd (#6373)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-12 18:30:54 -07:00 |
|
Saliya Ekanayake
|
a27f87da34
|
[Doc] Fix Typo in Doc (#6392)
Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>
|
2024-07-13 00:48:23 +00:00 |
|
Kevin H. Luu
|
16ff6bd58c
|
[ci] Fix wording for GH bot (#6398)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-12 16:34:37 -07:00 |
|
Woosuk Kwon
|
f8f9ff57ee
|
[Bugfix][TPU] Fix megacore setting for v5e-litepod (#6397)
|
2024-07-12 15:59:47 -07:00 |
|
Simon Mo
|
6bc9710f6e
|
Fix release pipeline's dir permission (#6391)
|
2024-07-12 15:52:43 -07:00 |
|
Michael Goin
|
111fc6e7ec
|
[Misc] Add generated git commit hash as vllm.__commit__ (#6386)
|
2024-07-12 22:52:15 +00:00 |
|