squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jared Roesch	79a30912b8	Add py.typed so consumers of vLLM can get type checking (#1509 ) * Add py.typed so consumers of vLLM can get type checking * Update py.typed --------- Co-authored-by: aarnphm <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-10-30 14:50:47 -07:00
Adam Brusselback	2f3d36a8a1	Fix logging so we actually get info level entries in the log. (#1494 )	2023-10-30 10:02:21 -07:00
iongpt	ac8d36f3e5	Refactor LLMEngine demo script for clarity and modularity (#1413 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-10-30 09:14:37 -07:00
Antoni Baum	15f5632365	Delay GPU->CPU sync in sampling (#1337 )	2023-10-30 09:01:34 -07:00
Woosuk Kwon	aa9af07cac	Fix bias in InternLM (#1501 )	2023-10-29 16:24:18 -07:00
ljss	69be658bba	Support repetition_penalty (#1424 )	2023-10-29 10:02:41 -07:00
Ricardo Lu	beac8dd461	fix: don't skip first special token. (#1497 )	2023-10-29 04:26:36 -07:00
Qing	28b47d1e49	Add rope_scaling to Aquila model (#1457 )	2023-10-29 04:25:21 -07:00
chooper1	1f24755bf8	Support SqueezeLLM (#1326 ) Co-authored-by: squeeze-ai-lab <squeezeailab.bair@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2023-10-21 23:14:59 -07:00
Thiago Salvatore	bf31d3606a	Pin pydantic dependency versions (#1429 )	2023-10-21 11:18:58 -07:00
Wang Ran (汪然)	d189170b6c	remove useless statements (#1408 )	2023-10-20 08:52:07 -07:00
Light Lin	f61dc8072f	Fix type hints (#1427 )	2023-10-20 08:50:47 -07:00
Woosuk Kwon	f8a1e39fae	[BugFix] Define `__eq__` in SequenceGroupOutputs (#1389 )	2023-10-17 01:09:44 -07:00
Wang Ran (汪然)	a132435204	Fix typo (#1383 )	2023-10-16 21:53:37 -07:00
Woosuk Kwon	9524867701	Add Mistral 7B to `test_models` (#1366 )	2023-10-16 17:49:54 -07:00
Woosuk Kwon	c1376e0f82	Change scheduler & input tensor shape (#1381 )	2023-10-16 17:48:42 -07:00
Zhuohan Li	651c614aa4	Bump up the version to v0.2.1 (#1355 )	2023-10-16 12:58:57 -07:00
Woosuk Kwon	d3a5bd9fb7	Fix sampler test (#1379 )	2023-10-16 12:57:26 -07:00
Woosuk Kwon	e8ef4c0820	Fix PyTorch index URL in workflow (#1378 )	2023-10-16 12:37:56 -07:00
Woosuk Kwon	348897af31	Fix PyTorch version to 2.0.1 in workflow (#1377 )	2023-10-16 11:27:17 -07:00
Zhuohan Li	9d9072a069	Implement prompt logprobs & Batched topk for computing logprobs (#1328 ) Co-authored-by: Yunmo Chen <16273544+wanmok@users.noreply.github.com>	2023-10-16 10:56:50 -07:00
Woosuk Kwon	928de46888	Implement PagedAttention V2 (#1348 )	2023-10-16 00:59:57 -07:00
Woosuk Kwon	29678cd213	Minor fix on AWQ kernel launch (#1356 )	2023-10-15 21:53:56 -07:00
Woosuk Kwon	d0740dff1b	Fix error message on `TORCH_CUDA_ARCH_LIST` (#1239 ) Co-authored-by: Yunfeng Bai <yunfeng.bai@scale.com>	2023-10-14 14:47:43 -07:00
Lu Wang	de89472897	Fix the issue for AquilaChat2-* models (#1339 )	2023-10-13 11:51:29 -07:00
Woosuk Kwon	e7c8555d06	Bump up transformers version & Remove MistralConfig (#1254 )	2023-10-13 10:05:26 -07:00
Antoni Baum	ec3b5ce9cc	Improve detokenization performance (#1338 )	2023-10-13 09:59:07 -07:00
ldwang	6368e777a8	Add Aquila2 to README (#1331 ) Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-10-12 12:11:16 -07:00
Woosuk Kwon	875afe38ab	Add blacklist in model checkpoint (#1325 )	2023-10-12 01:05:37 -07:00
amaleshvemula	ee8217e5be	Add Mistral to quantization model list (#1278 )	2023-10-11 00:26:24 -07:00
CHU Tianxiang	980dd4a2c4	Fix overflow in awq kernel (#1295 ) Co-authored-by: 楚天翔 <tianxiang.ctx@alibaba-inc.com>	2023-10-11 00:19:53 -07:00
twaka	8285736840	workaround of AWQ for Turing GPUs (#1252 )	2023-10-10 19:48:16 -07:00
yhlskt23	91fce82c6f	change the timing of sorting logits (#1309 )	2023-10-10 19:37:42 -07:00
Wang Ran (汪然)	ac5cf86aa6	Fix `__repr__` of `SequenceOutputs` (#1311 )	2023-10-10 09:58:28 -07:00
yanxiyue	6a6119554c	lock torch version to 2.0.1 (#1290 )	2023-10-10 09:21:57 -07:00
Zhuohan Li	b95ee898fe	[Minor] Fix comment in mistral.py (#1303 )	2023-10-09 19:44:37 -07:00
Zhuohan Li	9eed4d1f3e	Update README.md (#1292 )	2023-10-08 23:15:50 -07:00
Zhuohan Li	6b5296aa3a	[FIX] Explain why the finished_reason of ignored sequences are length (#1289 )	2023-10-08 15:22:38 -07:00
Antoni Baum	ee92b58b3a	Move bfloat16 check to worker (#1259 )	2023-10-07 22:10:44 -07:00
Yunfeng Bai	09ff7f106a	API server support ipv4 / ipv6 dualstack (#1288 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-10-07 15:15:54 -07:00
Antoni Baum	acbed3ef40	Use monotonic time where appropriate (#1249 )	2023-10-02 19:22:05 -07:00
Federico Cassano	66d18a7fb0	add support for tokenizer revision (#1163 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-10-02 19:19:46 -07:00
Zhuohan Li	ba0bfd40e2	TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181 )	2023-10-02 15:36:09 -07:00
Woosuk Kwon	84e4e37d14	[Minor] Fix type annotations (#1238 )	2023-10-02 15:28:31 -07:00
Zhuohan Li	a60b353005	support sharding llama2-70b on more than 8 GPUs (#1209 ) Co-authored-by: JiCheng <247153481@qq.com>	2023-10-02 15:26:33 -07:00
Liang	ebe4d1db3a	Fix boundary check in paged attention kernel (#1241 )	2023-10-01 11:35:06 -07:00
kg6-sleipnir	b5a10eb0ef	Added `dtype` arg to benchmarks (#1228 )	2023-09-30 21:04:03 -07:00
Usama Ahmed	0967102c6d	fixing typo in `tiiuae/falcon-rw-7b` model name (#1226 )	2023-09-29 13:40:25 -07:00
Woosuk Kwon	e2fb71ec9f	Bump up the version to v0.2.0 (#1212 )	2023-09-28 15:30:38 -07:00
Woosuk Kwon	f936657eb6	Provide default max model length (#1224 )	2023-09-28 14:44:02 -07:00

1 2 3 4 5 ...

431 Commits