shangmingc
|
a19e8d3726
|
[Misc][Speculative decoding] Typos and typing fixes (#6467)
Co-authored-by: caishangming.csm <caishangming.csm@alibaba-inc.com>
|
2024-07-17 07:17:07 +00:00 |
|
Hongxia Yang
|
10383887e0
|
[ROCm] Cleanup Dockerfile and remove outdated patch (#6482)
|
2024-07-16 22:47:02 -07:00 |
|
Wushi Dong
|
1d094fd7c0
|
[Distributed][PP] only create embedding & lm head when necessary (#6455)
original title: [Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization
|
2024-07-16 19:20:26 -07:00 |
|
youkaichao
|
ce37be7ba0
|
[misc][distributed] add seed to dummy weights (#6491)
|
2024-07-16 19:16:34 -07:00 |
|
youkaichao
|
7f62077af5
|
[misc][distributed] improve tests (#6488)
|
2024-07-16 17:35:52 -07:00 |
|
youkaichao
|
09c2eb85dd
|
[ci][distributed] add pipeline parallel correctness test (#6410)
|
2024-07-16 15:44:22 -07:00 |
|
Michael Goin
|
978aed5300
|
[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081)
|
2024-07-16 15:31:32 -07:00 |
|
Cody Yu
|
160e1d8c99
|
[Misc] Log spec decode metrics (#6454)
|
2024-07-16 20:37:10 +00:00 |
|
Jiaxin Shan
|
94162beb9f
|
[Doc] Fix the lora adapter path in server startup script (#6230)
|
2024-07-16 10:11:04 -07:00 |
|
Woosuk Kwon
|
c467dff24f
|
[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457)
|
2024-07-16 09:56:28 -07:00 |
|
youkaichao
|
9f4ccec761
|
[doc][misc] remind to cancel debugging environment variables (#6481)
[doc][misc] remind users to cancel debugging environment variables after debugging (#6481)
|
2024-07-16 09:45:30 -07:00 |
|
Cyrus Leung
|
38ef94888a
|
[CI/Build] Remove "boardwalk" image asset (#6460)
|
2024-07-16 08:59:36 -07:00 |
|
Peng Guanwen
|
2bb0489cb3
|
[Core] Use numpy to speed up padded token processing (#6442)
|
2024-07-16 08:13:25 -07:00 |
|
Thomas Parnell
|
7508a3dc34
|
[Misc] Fix typos in spec. decode metrics logging. (#6470)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-16 13:55:15 +00:00 |
|
sasha0552
|
7a3d2a5b95
|
[Frontend] Support for chat completions input in the tokenize endpoint (#5923)
|
2024-07-16 20:18:09 +08:00 |
|
Cyrus Leung
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
Woosuk Kwon
|
37d776606f
|
[Docs] Announce 5th meetup (#6458)
|
2024-07-15 21:04:58 -07:00 |
|
Joe
|
d92b3c5cde
|
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#6419)
|
2024-07-15 18:54:15 -07:00 |
|
Mor Zusman
|
9ad32dacd9
|
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug (#6425)
Co-authored-by: Mor Zusman <morz@ai21.com>
|
2024-07-16 01:32:55 +00:00 |
|
Kevin H. Luu
|
d6f3b3d5c4
|
Pin sphinx-argparse version (#6453)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-16 01:26:11 +00:00 |
|
Woosuk Kwon
|
4552e37b55
|
[CI/Build][TPU] Add TPU CI test (#6277)
Co-authored-by: kevin <kevin@anyscale.com>
|
2024-07-15 14:31:16 -07:00 |
|
Woosuk Kwon
|
ec9933f4a5
|
[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (#6289)
|
2024-07-15 19:02:14 +00:00 |
|
Woosuk Kwon
|
3dee97b05f
|
[Docs] Add Google Cloud to sponsor list (#6450)
|
2024-07-15 11:58:10 -07:00 |
|
youkaichao
|
4cf256ae7f
|
[misc][distributed] fix pp missing layer condition (#6446)
|
2024-07-15 10:32:35 -07:00 |
|
Simon Mo
|
64fdc08c72
|
bump version to v0.5.2 (#6433)
|
2024-07-15 17:27:40 +00:00 |
|
Thomas Parnell
|
4ef95b0f06
|
[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (#6409)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-07-15 13:14:49 -04:00 |
|
Thomas Parnell
|
eaec4b9153
|
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6140)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
|
2024-07-15 10:12:47 -07:00 |
|
Pernekhan Utemuratov
|
a63a4c6341
|
[Misc] Use 0.0.9 version for flashinfer (#6447)
Co-authored-by: Pernekhan Utemuratov <pernekhan@deepinfra.com>
|
2024-07-15 10:10:26 -07:00 |
|
Tyler Michael Smith
|
c8fd97f26d
|
[Kernel] Use CUTLASS kernels for the FP8 layers with Bias (#6270)
|
2024-07-15 13:05:52 -04:00 |
|
youkaichao
|
94b82e8c18
|
[doc][distributed] add suggestion for distributed inference (#6418)
|
2024-07-15 09:45:51 -07:00 |
|
Roger Wang
|
6ae1597ddf
|
[VLM] Minor space optimization for ClipVisionModel (#6436)
|
2024-07-15 17:29:51 +08:00 |
|
youkaichao
|
22e79ee8f3
|
[doc][misc] doc update (#6439)
|
2024-07-14 23:33:25 -07:00 |
|
Cyrus Leung
|
de19916314
|
[Bugfix] Convert image to RGB by default (#6430)
|
2024-07-15 05:39:15 +00:00 |
|
youkaichao
|
69672f116c
|
[core][distributed] simplify code to support pipeline parallel (#6406)
|
2024-07-14 21:20:51 -07:00 |
|
DefTruth
|
44874a0bf9
|
[Doc] add env docs for flashinfer backend (#6437)
|
2024-07-14 21:16:51 -07:00 |
|
zifeitong
|
b47008b4d2
|
[BugFix] BatchResponseData body should be optional (#6345)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-15 04:06:09 +00:00 |
|
Simon Mo
|
9bfece89fd
|
Add FUNDING.yml (#6435)
|
2024-07-14 20:36:16 -07:00 |
|
Simon Mo
|
32c9d7f765
|
Report usage for beam search (#6404)
|
2024-07-14 19:37:35 -07:00 |
|
Fish
|
ccb20db8bd
|
[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests' (#6428)
|
2024-07-14 19:27:01 -07:00 |
|
Robert Shaw
|
a754dc2cb9
|
[CI/Build] Cross python wheel (#6394)
|
2024-07-14 18:54:46 -07:00 |
|
Robert Cohn
|
61e85dbad8
|
[Doc] xpu backend requires running setvars.sh (#6393)
|
2024-07-14 17:10:11 -07:00 |
|
Ethan Xu
|
dbfe254eda
|
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-07-14 15:36:43 -07:00 |
|
Robert Shaw
|
73030b7dae
|
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423)
|
2024-07-14 21:38:42 +00:00 |
|
youkaichao
|
ccd3c04571
|
[ci][build] fix commit id (#6420)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-07-14 22:16:21 +08:00 |
|
Tyler Michael Smith
|
9dad5cc859
|
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384)
|
2024-07-14 13:37:19 +00:00 |
|
Yuan Tang
|
6ef3bf912c
|
Remove unnecessary trailing period in spec_decode.rst (#6405)
|
2024-07-14 07:58:09 +00:00 |
|
Isotr0py
|
540c0368b1
|
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-14 05:27:14 +00:00 |
|
Robert Shaw
|
fb6af8bc08
|
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417)
|
2024-07-13 20:03:58 -07:00 |
|
Woosuk Kwon
|
eeceadaecc
|
[Misc] Add deprecation warning for beam search (#6402)
|
2024-07-13 11:52:22 -07:00 |
|
Robert Shaw
|
babf52dade
|
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-13 10:21:37 +00:00 |
|