youkaichao
|
c391e4b68e
|
[Core] improve robustness of pynccl (#3860)
|
2024-04-04 16:52:12 -07:00 |
|
Saurabh Dash
|
9117f892f0
|
[Model] Cohere CommandR+ (#3829)
|
2024-04-04 13:31:49 -07:00 |
|
youkaichao
|
ca81ff5196
|
[Core] manage nccl via a pypi package & upgrade to pt 2.2.1 (#3805)
|
2024-04-04 10:26:19 -07:00 |
|
Matthias Gerstgrasser
|
aabe8f40f2
|
[Core] [Frontend] Make detokenization optional (#3749)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-04-03 21:52:18 -07:00 |
|
Woosuk Kwon
|
498eb5cfa3
|
[Bugfix] Add kv_scale input parameter to CPU backend (#3840)
|
2024-04-04 04:33:08 +00:00 |
|
Michael Feil
|
537ee25f43
|
[Core] Enable hf_transfer by default if available (#3817)
|
2024-04-04 04:02:43 +00:00 |
|
Tao He
|
294f8f6665
|
[BugFix] Pass tokenizer_config to local_tokenizer_group (#3754)
Signed-off-by: Tao He <sighingnow@gmail.com>
|
2024-04-03 20:31:46 -07:00 |
|
Adrian Abeyta
|
2ff767b513
|
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-03 14:15:55 -07:00 |
|
SangBin Cho
|
3dcb3e8b98
|
[3/N] Refactor scheduler for chunked prefill scheduling (#3550)
|
2024-04-03 14:13:49 -07:00 |
|
Nick Hill
|
c9b506dad4
|
[BugFix] Use different mechanism to get vllm version in is_cpu() (#3804)
|
2024-04-02 23:06:25 -07:00 |
|
Cade Daniel
|
5757d90e26
|
[Speculative decoding] Adding configuration object for speculative decoding (#3706)
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
|
2024-04-03 00:40:57 +00:00 |
|
youkaichao
|
a3c226e7eb
|
[CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary (#3803)
|
2024-04-02 12:57:04 -07:00 |
|
Michael Goin
|
b321d4881b
|
[Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ (#3798)
|
2024-04-02 12:35:31 -07:00 |
|
leiwen83
|
ad6eca408b
|
Fix early CUDA init via get_architecture_class_name import (#3770)
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
|
2024-04-02 11:56:26 -07:00 |
|
A-Mahla
|
0739b1947f
|
[Frontend][Bugfix] allow using the default middleware with a root path (#3788)
Co-authored-by: A-Mahla <>
|
2024-04-02 01:20:28 -07:00 |
|
bigPYJ1151
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|
Qubitium
|
7d4e1b85e7
|
[Misc] Add support for new autogptq checkpoint_format (#3689)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-04-01 19:32:01 -04:00 |
|
Cade Daniel
|
93deb0b38f
|
[Speculative decoding 4/9] Lookahead scheduling for speculative decoding (#3250)
|
2024-04-01 22:55:24 +00:00 |
|
Nick Hill
|
49782fcb76
|
[Misc] Some minor simplifications to detokenization logic (#3670)
Some simplifications made for clarity.
Also moves detokenization-related functions from tokenizer.py to detokenizer.py.
|
2024-04-01 13:22:06 -07:00 |
|
youkaichao
|
203d4f82ac
|
[Core][Bugfix] cache len of tokenizer (#3741)
|
2024-03-29 18:46:39 -07:00 |
|
Nick Hill
|
991143cfcd
|
[BugFix] Use consistent logger everywhere (#3738)
|
2024-03-29 23:26:44 +00:00 |
|
Simon Mo
|
8b2d3cbc1b
|
usage lib get version another way (#3735)
|
2024-03-29 15:57:08 -07:00 |
|
Hongxia Yang
|
9765b5c406
|
[ROCm][Bugfix] Fixed several bugs related to rccl path and attention selector logic (#3699)
|
2024-03-29 14:52:36 -07:00 |
|
Simon Mo
|
430530fc18
|
bump version to v0.4.0 (#3712)
|
2024-03-29 12:28:33 -07:00 |
|
Roger Wang
|
97356f3c7e
|
[Bugfix] Command-R Max Model Length (#3727)
|
2024-03-29 12:27:51 -07:00 |
|
Roy
|
f510395bbf
|
[BugFix][Frontend] Fix completion logprobs=0 error (#3731)
|
2024-03-29 09:38:21 -07:00 |
|
Roy
|
6110c39dc8
|
[BugFix] Fix tokenizer out of vocab size (#3685)
|
2024-03-29 08:18:59 -07:00 |
|
yhu422
|
d8658c8cc1
|
Usage Stats Collection (#2852)
|
2024-03-28 22:16:12 -07:00 |
|
youkaichao
|
756b30a5f3
|
[Core][Test] move local_rank to the last arg with default value(#3711)
[Core][Test] move local_rank to the last arg with default value to keep api compatible (#3711)
|
2024-03-28 21:19:45 -07:00 |
|
Woosuk Kwon
|
395aa823ea
|
[Misc] Minor type annotation fix (#3716)
|
2024-03-28 21:12:24 -07:00 |
|
youkaichao
|
f342153b48
|
Revert "bump version to v0.4.0" (#3708)
|
2024-03-28 18:49:42 -07:00 |
|
Simon Mo
|
27a57cad52
|
bump version to v0.4.0 (#3705)
|
2024-03-28 18:26:51 -07:00 |
|
youkaichao
|
0267fef52a
|
[Core] fix del of communicator (#3702)
|
2024-03-29 00:24:58 +00:00 |
|
Simon Mo
|
4716a32dd4
|
fix logging msg for block manager (#3701)
|
2024-03-28 23:29:55 +00:00 |
|
Woosuk Kwon
|
cb40b3ab6b
|
[Kernel] Add MoE Triton kernel configs for A100 40GB (#3700)
|
2024-03-28 15:26:24 -07:00 |
|
Roy
|
515386ef3c
|
[Core] Support multi-node inference(eager and cuda graph) (#3686)
|
2024-03-28 15:01:55 -07:00 |
|
Adam Boeglin
|
1715056fef
|
[Bugfix] Update neuron_executor.py to add optional vision_language_config (#3695)
|
2024-03-28 10:43:34 -07:00 |
|
SangBin Cho
|
b51c1cc9d2
|
[2/N] Chunked prefill data update (#3538)
|
2024-03-28 10:06:01 -07:00 |
|
Roger Wang
|
ce567a2926
|
[Kernel] DBRX Triton MoE kernel H100 (#3692)
|
2024-03-28 10:05:34 -07:00 |
|
wenyujin333
|
d6ea427f04
|
[Model] Add support for Qwen2MoeModel (#3346)
|
2024-03-28 15:19:59 +00:00 |
|
Cade Daniel
|
14ccd94c89
|
[Core][Bugfix]Refactor block manager for better testability (#3492)
|
2024-03-27 23:59:28 -07:00 |
|
Woosuk Kwon
|
8267b06c30
|
[Kernel] Add Triton MoE kernel configs for DBRX on A100 (#3679)
|
2024-03-27 22:22:25 -07:00 |
|
hxer7963
|
098e1776ba
|
[Model] Add support for xverse (#3610)
Co-authored-by: willhe <hexin@xverse.cn>
Co-authored-by: root <root@localhost.localdomain>
|
2024-03-27 18:12:54 -07:00 |
|
Roy
|
10e6322283
|
[Model] Fix and clean commandr (#3671)
|
2024-03-28 00:20:00 +00:00 |
|
zeppombal
|
1182607e18
|
Add support for Cohere's Command-R model (#3433)
Co-authored-by: José Maria Pombal <jose.pombal@unbabel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-03-27 14:19:32 -07:00 |
|
Megha Agarwal
|
e24336b5a7
|
[Model] Add support for DBRX (#3660)
|
2024-03-27 13:01:46 -07:00 |
|
youkaichao
|
d18f4e73f3
|
[Bugfix] [Hotfix] fix nccl library name (#3661)
|
2024-03-27 17:23:54 +00:00 |
|
Woosuk Kwon
|
82c540bebf
|
[Bugfix] More faithful implementation of Gemma (#3653)
|
2024-03-27 09:37:18 -07:00 |
|
youkaichao
|
8f44facddd
|
[Core] remove cupy dependency (#3625)
|
2024-03-27 00:33:26 -07:00 |
|
Woosuk Kwon
|
e66b629c04
|
[Misc] Minor fix in KVCache type (#3652)
|
2024-03-26 23:14:06 -07:00 |
|