Commit Graph - vllm - Gitea: Git with a cup of tea

squall/vllm

Fork 0

e8ddc08ec8

[BUG FIX] upgrade fschat version to 0.2.23 (#650) YHPeter 2023-08-02 17:05:59 -0400
1b0bd0fe8a

Add Falcon support (new) (#592) Zhuohan Li 2023-08-02 14:04:39 -0700
20044cab7a

Fix log message in scheduler (#652) Lily Liu 2023-08-02 13:35:10 -0700
64f23c2900

fix baichuan for different position embedding for 7b and 13b models (#643) Song 2023-08-02 13:22:51 +0800
d4c7755ca8

fix biachuan-7b tp (#598) Qing 2023-08-02 06:41:36 +0800
aa39e42c5a

fix doc (#622) Chaofan Lin 2023-08-01 04:11:57 +0800
953f28cf9a

fix ModuleNotFoundError (#599) Fang li 2023-07-30 11:52:41 +0800
c0d00f5be6

[Fix] fix import error of RayWorker (#604) (#605) Xudong Zhang 2023-07-28 14:37:40 +0800
58a072be15

[Fix] Add model sequence length into model config (#575) Zhuohan Li 2023-07-25 23:46:30 -0700
82ad323dee

[Fix] Add chat completion Example and simplify dependencies (#576) Zhuohan Li 2023-07-25 23:45:48 -0700
df5dd3c68e

Add Baichuan-7B to README (#494) Zhuohan Li 2023-07-25 15:25:12 -0700
2d867b55fa

fixed tensor parallel is not defined (#564) MoeedDar 2023-07-25 22:16:51 +0100
d7a1c6d614

Fix paged attention testing. (#495) Tao Peng 2023-07-25 12:01:56 +0800
7d5a155e4a

[Fix] Fix GPTBigcoder for distributed execution (#503) Zhuohan Li 2023-07-24 18:36:33 -0700
1dde34e0f8

GPTJConfig has no attribute rotary. (#532) leegohi04517 2023-07-25 02:29:30 +0800
6fc2a38b11

Add support for LLaMA-2 (#505) Zhuohan Li 2023-07-20 11:38:27 -0700
c487a221ee

Fix bad assert in initialize_cluster if PG already exists (#526) Antoni Baum 2023-07-19 23:17:12 -0700
9925c17940

Ray placement group support (#397) Antoni Baum 2023-07-19 22:49:31 -0700
8c4b2592fb

fix: enable trust-remote-code in api server & benchmark. (#509) Ricardo Lu 2023-07-20 08:06:15 +0800
cf21a9bd5c

support trust_remote_code in benchmark (#518) WRH 2023-07-20 08:02:40 +0800
16c3e295a8

fix(ray_utils): ignore re-init error (#465) Massimiliano Pronesti 2023-07-20 02:01:19 +0200
bda41c70dd

hotfix attn alibi wo head mapping (#496) Song 2023-07-19 02:31:48 +0800
453bafb96f

Merge pull request #498 from MoeedDar/main Lily Liu 2023-07-18 09:22:56 -0700
328d231c17 Fixed old name reference for max_seq_len MoeedDar 2023-07-18 16:47:59 +0100
b4b195b360

fix max seq len (#489) Lily Liu 2023-07-17 23:20:20 -0700
20b0d88d16

Add support for baichuan (#365) codethazine 2023-07-17 21:50:55 +0100
2bdea7ac11

[Fix] Fix the condition of max_seq_len (#477) Zhuohan Li 2023-07-17 00:33:48 -0400
58df2883cb

[Doc] Add doc for running vLLM on the cloud (#426) Zhanghao Wu 2023-07-16 13:37:14 -0700
6d7d95a70a

Offload port selection to OS (#467) Zhangir Azerbayev 2023-07-16 02:11:02 -0400
96853af5a8

Optimize MQA Kernel (#452) Zhuohan Li 2023-07-14 20:06:40 -0400
dbed69058c

Fix the KeyError when loading bloom-based models (#441) Wen Sun 2023-07-14 12:58:09 +0800
7b6ae94059

add vocab padding for LLama(Support WizardLM) (#411) panda 2023-07-14 11:56:22 +0800
c6dfc3cdbe

Fix handling of special tokens in decoding. (#418) xcnick 2023-07-12 23:14:56 +0800
51be365143

fix: freeze pydantic to v1 (#429) Keming 2023-07-12 23:10:55 +0800
c894836108

[Model] Add support for GPT-J (#226) Andre Slavescu 2023-07-08 20:55:16 -0400
75beba29b5

Don't try to load training_args.bin (#373) Fazlul Shahriar 2023-07-08 18:26:28 -0400
ddfdf470ae

Add trust_remote_code arg to get_config (#405) Woosuk Kwon 2023-07-08 15:24:17 -0700
b6fbb9a565

Sort the outputs before return (#402) Woosuk Kwon 2023-07-08 14:48:18 -0700
2179e4f4c5

avoid python list copy in sequence initialization (#401) Lily Liu 2023-07-08 12:42:08 -0700
a945fcc2ae

Add trust-remote-code flag to handle remote tokenizers (#364) codethazine 2023-07-07 20:04:58 +0200
be54f8e5c4

[Fix] Change /generate response-type to json for non-streaming (#374) Nicolas Frenay 2023-07-06 20:15:17 -0500
b396cb4998

fix: only response [DONE] once when streaming response. (#378) Ricardo Lu 2023-07-07 09:08:40 +0800
1c395b4eaa

Bump up the version (#300) Woosuk Kwon 2023-07-04 21:41:53 -0700
3d64cf019e

[Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357) akxxsb 2023-07-05 12:39:59 +0800
98fe8cb542

[Server] Add option to specify chat template for chat endpoint (#345) Zhuohan Li 2023-07-03 23:01:56 -0700
ffa6d2f9f9

[Docs] Fix typo (#346) Woosuk Kwon 2023-07-03 16:51:47 -0700
404422f42e

[Model] Add support for MPT (#334) Woosuk Kwon 2023-07-03 16:47:53 -0700
7717d0838b

Fix an endless loop issue when engine_step throws a RuntimeError (#339) coolcloudcol 2023-07-04 06:22:28 +0800
42e0c1df78

[Quality] Add CI for formatting (#343) Zhuohan Li 2023-07-03 14:50:56 -0700
e41f06702c

Add support for BLOOM (#331) Woosuk Kwon 2023-07-03 13:12:35 -0700
d6fa1be3a8

[Quality] Add code formatter and linter (#326) Zhuohan Li 2023-07-03 11:31:55 -0700
0ffded812a

[Fix] Better error message for batched prompts (#342) Zhuohan Li 2023-07-03 09:27:31 -0700
0bd2a573a5

Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323) Michele Catalano 2023-07-03 18:17:50 +0200
49b26e2cec

feat: add ChatCompletion endpoint in OpenAI demo server. (#330) Ricardo Lu 2023-07-03 13:54:33 +0800
dafd924c1f

Raise error for long prompt (#273) Lily Liu 2023-06-30 18:48:49 -0700
598dc4b79a

[Fix] Weight loading for GPTBigCode (#313) Zhuohan Li 2023-06-29 22:14:17 -0700
85de093472

[Fix] Do not pin memory when in WSL (#312) Zhuohan Li 2023-06-29 15:00:21 -0700
f72297562f

Add news for the vllm+skypilot example (#314) Zhanghao Wu 2023-06-29 12:32:37 -0700
9d27b09d12

Update README.md (#306) Bayang 2023-06-29 14:52:15 +0100
998d9d1509

[Tokenizer] Add tokenizer mode (#298) Woosuk Kwon 2023-06-28 14:19:22 -0700
425040d4c1

remove floats == 0 comparison (#285) Lily Liu 2023-06-28 14:11:51 -0700
4338cc4750

[Tokenizer] Add an option to specify tokenizer (#284) Woosuk Kwon 2023-06-28 09:46:58 -0700
bdd6b4c8bc

Add LLM.set_tokenizer (#283) Jishnu Ray Chowdhury 2023-06-28 02:28:29 -0500
2b7d3aca2e

Update setup.py (#282) Cody Yu 2023-06-27 14:34:23 -0700
4026a049d3

expand coverage of gpt2 model loading (#271) twaka 2023-06-27 22:27:41 +0900
43710e8d09

[Fix] Fix default port number in benchmark scripts (#265) Zhuohan Li 2023-06-26 13:15:35 -0700
526df28fb2

[BugFix] Fix a bug in counting running sequences (#266) Woosuk Kwon 2023-06-26 13:09:02 -0700
2cf1a333b6

[Doc] Documentation for distributed inference (#261) Zhuohan Li 2023-06-26 11:34:23 -0700
0b7db411b5

[Bug] Fix the OOM condition for CPU cache (#260) Zhuohan Li 2023-06-26 11:16:13 -0700
471a7a4566

Compatible with Decapoda Research llama hf version (#251) BasicCoder 2023-06-27 00:23:57 +0800
6214dd6ce9

Update README.md (#236) Lianmin Zheng 2023-06-25 16:58:06 -0700
0603379863

fix wrong using getattr to get dict value (#232) metacryptom 2023-06-25 13:00:24 +0800
665c48963b

[Docs] Add GPTBigCode to supported models (#213) Woosuk Kwon 2023-06-22 15:05:11 -0700
298695b766

GPTBigCode (StarCoder, SantaCoder Support) (#209) Michael Feil 2023-06-22 19:49:27 +0200
83658c8ace

Bump up version to 0.1.1 (#204) Zhuohan Li 2023-06-22 15:33:32 +0800
1d24ccb96c

[Fix] Better error message when there is OOM during cache initialization (#203) Zhuohan Li 2023-06-22 15:30:06 +0800
14f0b39cda

[Bugfix] Fix a bug in RequestOutput.finished (#202) Woosuk Kwon 2023-06-22 00:17:24 -0700
2e0d314384

fix-ray (#193) Zhuohan Li 2023-06-22 00:21:41 +0800
67d96c29fb

Use slow tokenizer for open llama models (#168) Woosuk Kwon 2023-06-19 23:19:47 -0700
033f5c78f5

Remove e.g. in README (#167) Zhuohan Li 2023-06-20 14:00:28 +0800
794e578de0

[Minor] Fix URLs (#166) Woosuk Kwon 2023-06-19 22:57:14 -0700
caddfc14c1

[Minor] Fix icons in doc (#165) Woosuk Kwon 2023-06-19 20:35:38 -0700
fc72e39de3

Change image urls (#164) Zhuohan Li 2023-06-20 11:15:15 +0800
b7e62d3454

Fix repo & documentation URLs (#163) Woosuk Kwon 2023-06-19 20:03:40 -0700
364536acd1

[Docs] Minor fix (#162) Woosuk Kwon 2023-06-19 19:58:23 -0700
0b32a987dd

Add and list supported models in README (#161) Zhuohan Li 2023-06-20 10:57:46 +0800
570fb2e9cc

[PyPI] Fix package info in setup.py (#158) Woosuk Kwon 2023-06-19 18:05:01 -0700
a255885f83

Add logo and polish readme (#156) Zhuohan Li 2023-06-19 16:31:13 +0800
5822ede66e

Add performance figures for dark mode (#160) Woosuk Kwon 2023-06-18 23:46:24 -0700
0370afa2e5

Remove benchmark_async_llm_server.py (#155) Zhuohan Li 2023-06-19 11:12:37 +0800
7e2a913c64

[Minor] Fix CompletionOutput.__repr__ (#157) Woosuk Kwon 2023-06-18 19:58:25 -0700
3f92038b99

Add comments on swap space (#154) Woosuk Kwon 2023-06-18 11:39:35 -0700
dcda03b4cb

Write README and front page of doc (#147) Woosuk Kwon 2023-06-18 03:19:38 -0700
bf5f121c02

Reduce GPU memory utilization to make sure OOM doesn't happen (#153) Zhuohan Li 2023-06-18 17:33:50 +0800
bec7b2dc26

Add quickstart guide (#148) Zhuohan Li 2023-06-18 01:26:12 +0800
0b98ba15c7

Change the name to vLLM (#150) Woosuk Kwon 2023-06-17 03:07:40 -0700
e5464ee484

Rename servers to engines (#152) Zhuohan Li 2023-06-17 17:25:21 +0800
bab8f3dd0d

[Minor] Fix benchmark_throughput.py (#151) Woosuk Kwon 2023-06-16 21:00:52 -0700
eedb46bf03

Rename servers and change port numbers to reduce confusion (#149) Zhuohan Li 2023-06-17 00:13:02 +0800
311490a720

Add script for benchmarking serving throughput (#145) Woosuk Kwon 2023-06-14 19:55:38 -0700

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main