Qing
|
28b47d1e49
|
Add rope_scaling to Aquila model (#1457)
|
2023-10-29 04:25:21 -07:00 |
|
chooper1
|
1f24755bf8
|
Support SqueezeLLM (#1326)
Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-10-21 23:14:59 -07:00 |
|
Thiago Salvatore
|
bf31d3606a
|
Pin pydantic dependency versions (#1429)
|
2023-10-21 11:18:58 -07:00 |
|
Wang Ran (汪然)
|
d189170b6c
|
remove useless statements (#1408)
|
2023-10-20 08:52:07 -07:00 |
|
Light Lin
|
f61dc8072f
|
Fix type hints (#1427)
|
2023-10-20 08:50:47 -07:00 |
|
Woosuk Kwon
|
f8a1e39fae
|
[BugFix] Define __eq__ in SequenceGroupOutputs (#1389)
|
2023-10-17 01:09:44 -07:00 |
|
Wang Ran (汪然)
|
a132435204
|
Fix typo (#1383)
|
2023-10-16 21:53:37 -07:00 |
|
Woosuk Kwon
|
9524867701
|
Add Mistral 7B to test_models (#1366)
|
2023-10-16 17:49:54 -07:00 |
|
Woosuk Kwon
|
c1376e0f82
|
Change scheduler & input tensor shape (#1381)
|
2023-10-16 17:48:42 -07:00 |
|
Zhuohan Li
|
651c614aa4
|
Bump up the version to v0.2.1 (#1355)
|
2023-10-16 12:58:57 -07:00 |
|
Woosuk Kwon
|
d3a5bd9fb7
|
Fix sampler test (#1379)
|
2023-10-16 12:57:26 -07:00 |
|
Woosuk Kwon
|
e8ef4c0820
|
Fix PyTorch index URL in workflow (#1378)
|
2023-10-16 12:37:56 -07:00 |
|
Woosuk Kwon
|
348897af31
|
Fix PyTorch version to 2.0.1 in workflow (#1377)
|
2023-10-16 11:27:17 -07:00 |
|
Zhuohan Li
|
9d9072a069
|
Implement prompt logprobs & Batched topk for computing logprobs (#1328)
Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>
|
2023-10-16 10:56:50 -07:00 |
|
Woosuk Kwon
|
928de46888
|
Implement PagedAttention V2 (#1348)
|
2023-10-16 00:59:57 -07:00 |
|
Woosuk Kwon
|
29678cd213
|
Minor fix on AWQ kernel launch (#1356)
|
2023-10-15 21:53:56 -07:00 |
|
Woosuk Kwon
|
d0740dff1b
|
Fix error message on TORCH_CUDA_ARCH_LIST (#1239)
Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>
|
2023-10-14 14:47:43 -07:00 |
|
Lu Wang
|
de89472897
|
Fix the issue for AquilaChat2-* models (#1339)
|
2023-10-13 11:51:29 -07:00 |
|
Woosuk Kwon
|
e7c8555d06
|
Bump up transformers version & Remove MistralConfig (#1254)
|
2023-10-13 10:05:26 -07:00 |
|
Antoni Baum
|
ec3b5ce9cc
|
Improve detokenization performance (#1338)
|
2023-10-13 09:59:07 -07:00 |
|
ldwang
|
6368e777a8
|
Add Aquila2 to README (#1331)
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
|
2023-10-12 12:11:16 -07:00 |
|
Woosuk Kwon
|
875afe38ab
|
Add blacklist in model checkpoint (#1325)
|
2023-10-12 01:05:37 -07:00 |
|
amaleshvemula
|
ee8217e5be
|
Add Mistral to quantization model list (#1278)
|
2023-10-11 00:26:24 -07:00 |
|
CHU Tianxiang
|
980dd4a2c4
|
Fix overflow in awq kernel (#1295)
Co-authored-by: 楚天翔 <tianxiang.ctx@alibaba-inc.com>
|
2023-10-11 00:19:53 -07:00 |
|
twaka
|
8285736840
|
workaround of AWQ for Turing GPUs (#1252)
|
2023-10-10 19:48:16 -07:00 |
|
yhlskt23
|
91fce82c6f
|
change the timing of sorting logits (#1309)
|
2023-10-10 19:37:42 -07:00 |
|
Wang Ran (汪然)
|
ac5cf86aa6
|
Fix __repr__ of SequenceOutputs (#1311)
|
2023-10-10 09:58:28 -07:00 |
|
yanxiyue
|
6a6119554c
|
lock torch version to 2.0.1 (#1290)
|
2023-10-10 09:21:57 -07:00 |
|
Zhuohan Li
|
b95ee898fe
|
[Minor] Fix comment in mistral.py (#1303)
|
2023-10-09 19:44:37 -07:00 |
|
Zhuohan Li
|
9eed4d1f3e
|
Update README.md (#1292)
|
2023-10-08 23:15:50 -07:00 |
|
Zhuohan Li
|
6b5296aa3a
|
[FIX] Explain why the finished_reason of ignored sequences are length (#1289)
|
2023-10-08 15:22:38 -07:00 |
|
Antoni Baum
|
ee92b58b3a
|
Move bfloat16 check to worker (#1259)
|
2023-10-07 22:10:44 -07:00 |
|
Yunfeng Bai
|
09ff7f106a
|
API server support ipv4 / ipv6 dualstack (#1288)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-10-07 15:15:54 -07:00 |
|
Antoni Baum
|
acbed3ef40
|
Use monotonic time where appropriate (#1249)
|
2023-10-02 19:22:05 -07:00 |
|
Federico Cassano
|
66d18a7fb0
|
add support for tokenizer revision (#1163)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-10-02 19:19:46 -07:00 |
|
Zhuohan Li
|
ba0bfd40e2
|
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181)
|
2023-10-02 15:36:09 -07:00 |
|
Woosuk Kwon
|
84e4e37d14
|
[Minor] Fix type annotations (#1238)
|
2023-10-02 15:28:31 -07:00 |
|
Zhuohan Li
|
a60b353005
|
support sharding llama2-70b on more than 8 GPUs (#1209)
Co-authored-by: JiCheng <247153481@qq.com>
|
2023-10-02 15:26:33 -07:00 |
|
Liang
|
ebe4d1db3a
|
Fix boundary check in paged attention kernel (#1241)
|
2023-10-01 11:35:06 -07:00 |
|
kg6-sleipnir
|
b5a10eb0ef
|
Added dtype arg to benchmarks (#1228)
|
2023-09-30 21:04:03 -07:00 |
|
Usama Ahmed
|
0967102c6d
|
fixing typo in tiiuae/falcon-rw-7b model name (#1226)
|
2023-09-29 13:40:25 -07:00 |
|
Woosuk Kwon
|
e2fb71ec9f
|
Bump up the version to v0.2.0 (#1212)
|
2023-09-28 15:30:38 -07:00 |
|
Woosuk Kwon
|
f936657eb6
|
Provide default max model length (#1224)
|
2023-09-28 14:44:02 -07:00 |
|
Woosuk Kwon
|
6f88f762bf
|
Fix OOM in attention kernel test (#1223)
|
2023-09-28 14:33:24 -07:00 |
|
Woosuk Kwon
|
202351d5bf
|
Add Mistral to supported model list (#1221)
|
2023-09-28 14:33:04 -07:00 |
|
Woosuk Kwon
|
2e8e49fce3
|
[Fix] Remove false assertion (#1222)
|
2023-09-28 10:52:38 -07:00 |
|
Woosuk Kwon
|
a8e98aee0c
|
Fix Mistral model (#1220)
|
2023-09-28 10:44:05 -07:00 |
|
Chris Bamford
|
bb1ba58f06
|
[Mistral] Mistral-7B-v0.1 support (#1196)
Co-authored-by: timlacroix <t@mistral.ai>
|
2023-09-28 10:41:03 -07:00 |
|
Qing
|
7bedab5748
|
Add rope_scaling to Qwen (#1210)
|
2023-09-28 00:49:23 -07:00 |
|
Dan Lord
|
20f7cc4cde
|
Add skip_special_tokens sampling params (#1186)
|
2023-09-27 19:21:42 -07:00 |
|