Noam Gat
|
555bdcc5a3
|
Added logits processor API to sampling params (#1469)
|
2023-11-03 14:12:15 -07:00 |
|
Antoni Baum
|
9738b84a08
|
Force paged attention v2 for long contexts (#1510)
|
2023-11-01 16:24:32 -07:00 |
|
Woosuk Kwon
|
1fe0990023
|
Remove MPTConfig (#1529)
|
2023-11-01 15:29:05 -07:00 |
|
Wenfei Yan
|
cf8849f2d6
|
Add MptForCausalLM key in model_loader (#1526)
|
2023-10-31 15:46:53 -07:00 |
|
Antoni Baum
|
15f5632365
|
Delay GPU->CPU sync in sampling (#1337)
|
2023-10-30 09:01:34 -07:00 |
|
Woosuk Kwon
|
aa9af07cac
|
Fix bias in InternLM (#1501)
|
2023-10-29 16:24:18 -07:00 |
|
ljss
|
69be658bba
|
Support repetition_penalty (#1424)
|
2023-10-29 10:02:41 -07:00 |
|
Qing
|
28b47d1e49
|
Add rope_scaling to Aquila model (#1457)
|
2023-10-29 04:25:21 -07:00 |
|
chooper1
|
1f24755bf8
|
Support SqueezeLLM (#1326)
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-10-21 23:14:59 -07:00 |
|
Wang Ran (汪然)
|
d189170b6c
|
remove useless statements (#1408)
|
2023-10-20 08:52:07 -07:00 |
|
Wang Ran (汪然)
|
a132435204
|
Fix typo (#1383)
|
2023-10-16 21:53:37 -07:00 |
|
Woosuk Kwon
|
c1376e0f82
|
Change scheduler & input tensor shape (#1381)
|
2023-10-16 17:48:42 -07:00 |
|
Zhuohan Li
|
9d9072a069
|
Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
|
2023-10-16 10:56:50 -07:00 |
|
Woosuk Kwon
|
928de46888
|
Implement PagedAttention V2 (#1348)
|
2023-10-16 00:59:57 -07:00 |
|
Lu Wang
|
de89472897
|
Fix the issue for AquilaChat2-* models (#1339)
|
2023-10-13 11:51:29 -07:00 |
|
Woosuk Kwon
|
e7c8555d06
|
Bump up transformers version & Remove MistralConfig (#1254)
|
2023-10-13 10:05:26 -07:00 |
|
Woosuk Kwon
|
875afe38ab
|
Add blacklist in model checkpoint (#1325)
|
2023-10-12 01:05:37 -07:00 |
|
amaleshvemula
|
ee8217e5be
|
Add Mistral to quantization model list (#1278)
|
2023-10-11 00:26:24 -07:00 |
|
twaka
|
8285736840
|
workaround of AWQ for Turing GPUs (#1252)
|
2023-10-10 19:48:16 -07:00 |
|
yhlskt23
|
91fce82c6f
|
change the timing of sorting logits (#1309)
|
2023-10-10 19:37:42 -07:00 |
|
Zhuohan Li
|
b95ee898fe
|
[Minor] Fix comment in mistral.py (#1303)
|
2023-10-09 19:44:37 -07:00 |
|
Zhuohan Li
|
ba0bfd40e2
|
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181)
|
2023-10-02 15:36:09 -07:00 |
|
Woosuk Kwon
|
84e4e37d14
|
[Minor] Fix type annotations (#1238)
|
2023-10-02 15:28:31 -07:00 |
|
Zhuohan Li
|
a60b353005
|
support sharding llama2-70b on more than 8 GPUs (#1209)
Co-authored-by: JiCheng <247153481@qq.com>
|
2023-10-02 15:26:33 -07:00 |
|
Woosuk Kwon
|
a8e98aee0c
|
Fix Mistral model (#1220)
|
2023-09-28 10:44:05 -07:00 |
|
Chris Bamford
|
bb1ba58f06
|
[Mistral] Mistral-7B-v0.1 support (#1196)
Co-authored-by: timlacroix <t@mistral.ai>
|
2023-09-28 10:41:03 -07:00 |
|
Qing
|
7bedab5748
|
Add rope_scaling to Qwen (#1210)
|
2023-09-28 00:49:23 -07:00 |
|
Qing
|
28e616c4e3
|
fix qwen-14b model (#1173)
|
2023-09-27 16:33:16 -07:00 |
|
Lily Liu
|
21877b0d75
|
Support Longchat and RoPE scaling (#555)
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-09-27 03:36:02 -07:00 |
|
Woosuk Kwon
|
03ffd0a022
|
Add comments on RoPE initialization (#1176)
|
2023-09-26 10:48:33 -07:00 |
|
Zhuohan Li
|
f187877945
|
[FIX] Simplify sampler logic (#1156)
|
2023-09-23 17:21:56 -07:00 |
|
Zhuohan Li
|
947b794146
|
[Sampler] Vectorized sampling (simplified) (#1048)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-22 17:48:04 -07:00 |
|
Antoni Baum
|
3302f0aef3
|
rope_theta and max_position_embeddings from config (#1096)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
|
2023-09-20 13:35:11 -07:00 |
|
Woosuk Kwon
|
2b1c116b5a
|
Add minimum capability requirement for AWQ (#1064)
|
2023-09-18 12:02:01 -07:00 |
|
Woosuk Kwon
|
cc796b1358
|
Convert before transpose (#1073)
|
2023-09-18 11:51:48 -07:00 |
|
Zhuohan Li
|
90979c38f8
|
[FIX] Don't initialize parameter by default (#1067)
|
2023-09-17 17:15:38 -07:00 |
|
Woosuk Kwon
|
e3e79e9e8a
|
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
|
2023-09-16 00:03:37 -07:00 |
|
Zhuohan Li
|
f04908cae7
|
[FIX] Minor bug fixes (#1035)
* [FIX] Minor bug fixes
* Address review comments
|
2023-09-13 16:38:12 -07:00 |
|
Jasmond L
|
ab019eea75
|
Add Model Revision Support (#1014)
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-09-13 15:20:02 -07:00 |
|
Woosuk Kwon
|
e67b4f2c2a
|
Use FP32 in RoPE initialization (#1004)
Co-authored-by: One <imone@tuta.io>
|
2023-09-11 00:26:35 -07:00 |
|
Antoni Baum
|
a62de9ecfd
|
Fix wrong dtype in PagedAttentionWithALiBi bias (#996)
---------
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-09 14:58:35 -07:00 |
|
Robert Irvine
|
4b5bcf8906
|
faster startup of vLLM (#982)
* update
---------
Co-authored-by: Robert Irvine <robert@seamlessml.com>
|
2023-09-08 14:48:54 +09:00 |
|
Zhuohan Li
|
c957c741d9
|
Enable safetensors loading for all models (#974)
|
2023-09-07 15:49:52 -07:00 |
|
Antoni Baum
|
005ba458b5
|
Set torch default dtype in a context manager (#971)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-09-07 15:39:37 +09:00 |
|
Woosuk Kwon
|
320a622ec4
|
[BugFix] Implement RoPE for GPT-J (#941)
|
2023-09-06 11:54:33 +09:00 |
|
Zhuohan Li
|
002800f081
|
Align vLLM's beam search implementation with HF generate (#857)
|
2023-09-04 17:29:42 -07:00 |
|
Dong-Yong Lee
|
e11222333f
|
fix: bug fix when penalties are negative (#913)
Co-authored-by: dongyong-lee <dongyong.lee@navercorp.com>
|
2023-09-01 00:37:17 +09:00 |
|
Aman Gupta Karmani
|
28873a2799
|
Improve _prune_hidden_states micro-benchmark (#707)
|
2023-08-31 13:28:43 +09:00 |
|
JFDuan
|
0d93f15694
|
Accelerate LLaMA model loading (#234)
|
2023-08-30 01:00:13 -07:00 |
|
Aman Gupta Karmani
|
75471386de
|
use flash-attn via xformers (#877)
|
2023-08-29 21:52:13 -07:00 |
|