Commit Graph

184 Commits

Author SHA1 Message Date
Wang Ran (汪然)
d189170b6c
remove useless statements (#1408) 2023-10-20 08:52:07 -07:00
Light Lin
f61dc8072f
Fix type hints (#1427) 2023-10-20 08:50:47 -07:00
Woosuk Kwon
f8a1e39fae
[BugFix] Define __eq__ in SequenceGroupOutputs (#1389) 2023-10-17 01:09:44 -07:00
Wang Ran (汪然)
a132435204
Fix typo (#1383) 2023-10-16 21:53:37 -07:00
Woosuk Kwon
c1376e0f82
Change scheduler & input tensor shape (#1381) 2023-10-16 17:48:42 -07:00
Zhuohan Li
651c614aa4
Bump up the version to v0.2.1 (#1355) 2023-10-16 12:58:57 -07:00
Zhuohan Li
9d9072a069
Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
2023-10-16 10:56:50 -07:00
Woosuk Kwon
928de46888
Implement PagedAttention V2 (#1348) 2023-10-16 00:59:57 -07:00
Lu Wang
de89472897
Fix the issue for AquilaChat2-* models (#1339) 2023-10-13 11:51:29 -07:00
Woosuk Kwon
e7c8555d06
Bump up transformers version & Remove MistralConfig (#1254) 2023-10-13 10:05:26 -07:00
Antoni Baum
ec3b5ce9cc
Improve detokenization performance (#1338) 2023-10-13 09:59:07 -07:00
Woosuk Kwon
875afe38ab
Add blacklist in model checkpoint (#1325) 2023-10-12 01:05:37 -07:00
amaleshvemula
ee8217e5be
Add Mistral to quantization model list (#1278) 2023-10-11 00:26:24 -07:00
twaka
8285736840
workaround of AWQ for Turing GPUs (#1252) 2023-10-10 19:48:16 -07:00
yhlskt23
91fce82c6f
change the timing of sorting logits (#1309) 2023-10-10 19:37:42 -07:00
Wang Ran (汪然)
ac5cf86aa6
Fix __repr__ of SequenceOutputs (#1311) 2023-10-10 09:58:28 -07:00
Zhuohan Li
b95ee898fe
[Minor] Fix comment in mistral.py (#1303) 2023-10-09 19:44:37 -07:00
Zhuohan Li
6b5296aa3a
[FIX] Explain why the finished_reason of ignored sequences are length (#1289) 2023-10-08 15:22:38 -07:00
Antoni Baum
ee92b58b3a
Move bfloat16 check to worker (#1259) 2023-10-07 22:10:44 -07:00
Yunfeng Bai
09ff7f106a
API server support ipv4 / ipv6 dualstack (#1288)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-07 15:15:54 -07:00
Antoni Baum
acbed3ef40
Use monotonic time where appropriate (#1249) 2023-10-02 19:22:05 -07:00
Federico Cassano
66d18a7fb0
add support for tokenizer revision (#1163)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-02 19:19:46 -07:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
Woosuk Kwon
84e4e37d14
[Minor] Fix type annotations (#1238) 2023-10-02 15:28:31 -07:00
Zhuohan Li
a60b353005
support sharding llama2-70b on more than 8 GPUs (#1209)
Co-authored-by: JiCheng <247153481@qq.com>
2023-10-02 15:26:33 -07:00
Woosuk Kwon
e2fb71ec9f
Bump up the version to v0.2.0 (#1212) 2023-09-28 15:30:38 -07:00
Woosuk Kwon
f936657eb6
Provide default max model length (#1224) 2023-09-28 14:44:02 -07:00
Woosuk Kwon
2e8e49fce3
[Fix] Remove false assertion (#1222) 2023-09-28 10:52:38 -07:00
Woosuk Kwon
a8e98aee0c
Fix Mistral model (#1220) 2023-09-28 10:44:05 -07:00
Chris Bamford
bb1ba58f06
[Mistral] Mistral-7B-v0.1 support (#1196)
Co-authored-by: timlacroix <t@mistral.ai>
2023-09-28 10:41:03 -07:00
Qing
7bedab5748
Add rope_scaling to Qwen (#1210) 2023-09-28 00:49:23 -07:00
Dan Lord
20f7cc4cde
Add skip_special_tokens sampling params (#1186) 2023-09-27 19:21:42 -07:00
Woosuk Kwon
a19bc5c628
Automatically configure max_num_batched_tokens (#1198) 2023-09-27 16:34:00 -07:00
Qing
28e616c4e3
fix qwen-14b model (#1173) 2023-09-27 16:33:16 -07:00
Wang Ran (汪然)
30e775281d
fix typo (#1184)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-27 16:22:45 -07:00
Lily Liu
21877b0d75
Support Longchat and RoPE scaling (#555)
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-09-27 03:36:02 -07:00
Antoni Baum
cf5cb1e33e
Allocate more shared memory to attention kernel (#1154) 2023-09-26 22:27:13 -07:00
Woosuk Kwon
03ffd0a022
Add comments on RoPE initialization (#1176) 2023-09-26 10:48:33 -07:00
Wen Sun
bbbf86565f
Align max_tokens behavior with openai (#852) 2023-09-23 18:10:13 -07:00
Woosuk Kwon
9f6be8692e
Fix config for Falcon (#1164) 2023-09-23 17:38:43 -07:00
Zhuohan Li
f187877945
[FIX] Simplify sampler logic (#1156) 2023-09-23 17:21:56 -07:00
Zhuohan Li
947b794146
[Sampler] Vectorized sampling (simplified) (#1048)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-09-22 17:48:04 -07:00
Ricardo Lu
f98b745a81
feat: support stop_token_ids parameter. (#1097) 2023-09-21 15:34:02 -07:00
Roy
2d1e86f1b1
clean api code, remove redundant background task. (#1102) 2023-09-21 13:25:05 -07:00
Woosuk Kwon
1ac4ccf73c
Add float16 and float32 (#1115) 2023-09-21 00:52:47 -07:00
Woosuk Kwon
2ac4d5e2bf
Replace DtypeTensor (#1123) 2023-09-21 00:51:47 -07:00
Antoni Baum
3302f0aef3
rope_theta and max_position_embeddings from config (#1096)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
2023-09-20 13:35:11 -07:00
Woosuk Kwon
bc0644574c
Add gpu_memory_utilization and swap_space to LLM (#1090) 2023-09-19 22:16:04 -07:00
Woosuk Kwon
400b8289f7
Add pyarrow to dependencies & Print warning on Ray import error (#1094) 2023-09-18 22:36:17 -07:00
Woosuk Kwon
2b1c116b5a
Add minimum capability requirement for AWQ (#1064) 2023-09-18 12:02:01 -07:00