Commit Graph

601 Commits

Author SHA1 Message Date
youkaichao
c391e4b68e
[Core] improve robustness of pynccl (#3860) 2024-04-04 16:52:12 -07:00
Saurabh Dash
9117f892f0
[Model] Cohere CommandR+ (#3829) 2024-04-04 13:31:49 -07:00
youkaichao
ca81ff5196
[Core] manage nccl via a pypi package & upgrade to pt 2.2.1 (#3805) 2024-04-04 10:26:19 -07:00
Matthias Gerstgrasser
aabe8f40f2
[Core] [Frontend] Make detokenization optional (#3749)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-04-03 21:52:18 -07:00
Woosuk Kwon
498eb5cfa3
[Bugfix] Add kv_scale input parameter to CPU backend (#3840) 2024-04-04 04:33:08 +00:00
Michael Feil
537ee25f43
[Core] Enable hf_transfer by default if available (#3817) 2024-04-04 04:02:43 +00:00
Tao He
294f8f6665
[BugFix] Pass tokenizer_config to local_tokenizer_group (#3754)
Signed-off-by: Tao He <sighingnow@gmail.com>
2024-04-03 20:31:46 -07:00
Adrian Abeyta
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
SangBin Cho
3dcb3e8b98
[3/N] Refactor scheduler for chunked prefill scheduling (#3550) 2024-04-03 14:13:49 -07:00
Nick Hill
c9b506dad4
[BugFix] Use different mechanism to get vllm version in is_cpu() (#3804) 2024-04-02 23:06:25 -07:00
Cade Daniel
5757d90e26
[Speculative decoding] Adding configuration object for speculative decoding (#3706)
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
2024-04-03 00:40:57 +00:00
youkaichao
a3c226e7eb
[CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary (#3803) 2024-04-02 12:57:04 -07:00
Michael Goin
b321d4881b
[Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ (#3798) 2024-04-02 12:35:31 -07:00
leiwen83
ad6eca408b
Fix early CUDA init via get_architecture_class_name import (#3770)
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
2024-04-02 11:56:26 -07:00
A-Mahla
0739b1947f
[Frontend][Bugfix] allow using the default middleware with a root path (#3788)
Co-authored-by: A-Mahla <>
2024-04-02 01:20:28 -07:00
bigPYJ1151
0e3f06fe9c
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00
Qubitium
7d4e1b85e7
[Misc] Add support for new autogptq checkpoint_format (#3689)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-04-01 19:32:01 -04:00
Cade Daniel
93deb0b38f
[Speculative decoding 4/9] Lookahead scheduling for speculative decoding (#3250) 2024-04-01 22:55:24 +00:00
Nick Hill
49782fcb76
[Misc] Some minor simplifications to detokenization logic (#3670)
Some simplifications made for clarity.

Also moves detokenization-related functions from tokenizer.py to detokenizer.py.
2024-04-01 13:22:06 -07:00
youkaichao
203d4f82ac
[Core][Bugfix] cache len of tokenizer (#3741) 2024-03-29 18:46:39 -07:00
Nick Hill
991143cfcd
[BugFix] Use consistent logger everywhere (#3738) 2024-03-29 23:26:44 +00:00
Simon Mo
8b2d3cbc1b
usage lib get version another way (#3735) 2024-03-29 15:57:08 -07:00
Hongxia Yang
9765b5c406
[ROCm][Bugfix] Fixed several bugs related to rccl path and attention selector logic (#3699) 2024-03-29 14:52:36 -07:00
Simon Mo
430530fc18
bump version to v0.4.0 (#3712) 2024-03-29 12:28:33 -07:00
Roger Wang
97356f3c7e
[Bugfix] Command-R Max Model Length (#3727) 2024-03-29 12:27:51 -07:00
Roy
f510395bbf
[BugFix][Frontend] Fix completion logprobs=0 error (#3731) 2024-03-29 09:38:21 -07:00
Roy
6110c39dc8
[BugFix] Fix tokenizer out of vocab size (#3685) 2024-03-29 08:18:59 -07:00
yhu422
d8658c8cc1
Usage Stats Collection (#2852) 2024-03-28 22:16:12 -07:00
youkaichao
756b30a5f3
[Core][Test] move local_rank to the last arg with default value(#3711)
[Core][Test] move local_rank to the last arg with default value to keep api compatible (#3711)
2024-03-28 21:19:45 -07:00
Woosuk Kwon
395aa823ea
[Misc] Minor type annotation fix (#3716) 2024-03-28 21:12:24 -07:00
youkaichao
f342153b48
Revert "bump version to v0.4.0" (#3708) 2024-03-28 18:49:42 -07:00
Simon Mo
27a57cad52
bump version to v0.4.0 (#3705) 2024-03-28 18:26:51 -07:00
youkaichao
0267fef52a
[Core] fix del of communicator (#3702) 2024-03-29 00:24:58 +00:00
Simon Mo
4716a32dd4
fix logging msg for block manager (#3701) 2024-03-28 23:29:55 +00:00
Woosuk Kwon
cb40b3ab6b
[Kernel] Add MoE Triton kernel configs for A100 40GB (#3700) 2024-03-28 15:26:24 -07:00
Roy
515386ef3c
[Core] Support multi-node inference(eager and cuda graph) (#3686) 2024-03-28 15:01:55 -07:00
Adam Boeglin
1715056fef
[Bugfix] Update neuron_executor.py to add optional vision_language_config (#3695) 2024-03-28 10:43:34 -07:00
SangBin Cho
b51c1cc9d2
[2/N] Chunked prefill data update (#3538) 2024-03-28 10:06:01 -07:00
Roger Wang
ce567a2926
[Kernel] DBRX Triton MoE kernel H100 (#3692) 2024-03-28 10:05:34 -07:00
wenyujin333
d6ea427f04
[Model] Add support for Qwen2MoeModel (#3346) 2024-03-28 15:19:59 +00:00
Cade Daniel
14ccd94c89
[Core][Bugfix]Refactor block manager for better testability (#3492) 2024-03-27 23:59:28 -07:00
Woosuk Kwon
8267b06c30
[Kernel] Add Triton MoE kernel configs for DBRX on A100 (#3679) 2024-03-27 22:22:25 -07:00
hxer7963
098e1776ba
[Model] Add support for xverse (#3610)
Co-authored-by: willhe <hexin@xverse.cn>
Co-authored-by: root <root@localhost.localdomain>
2024-03-27 18:12:54 -07:00
Roy
10e6322283
[Model] Fix and clean commandr (#3671) 2024-03-28 00:20:00 +00:00
zeppombal
1182607e18
Add support for Cohere's Command-R model (#3433)
Co-authored-by: José Maria Pombal <jose.pombal@unbabel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-03-27 14:19:32 -07:00
Megha Agarwal
e24336b5a7
[Model] Add support for DBRX (#3660) 2024-03-27 13:01:46 -07:00
youkaichao
d18f4e73f3
[Bugfix] [Hotfix] fix nccl library name (#3661) 2024-03-27 17:23:54 +00:00
Woosuk Kwon
82c540bebf
[Bugfix] More faithful implementation of Gemma (#3653) 2024-03-27 09:37:18 -07:00
youkaichao
8f44facddd
[Core] remove cupy dependency (#3625) 2024-03-27 00:33:26 -07:00
Woosuk Kwon
e66b629c04
[Misc] Minor fix in KVCache type (#3652) 2024-03-26 23:14:06 -07:00