Harry Mellor
08133c4d1a
Add SSL arguments to API servers ( #2109 )
2023-12-18 10:56:23 +08:00
Woosuk Kwon
30fb0956df
[Minor] Add more detailed explanation on quantization argument ( #2145 )
2023-12-17 01:56:16 -08:00
Woosuk Kwon
37ca558103
Optimize model execution with CUDA graph ( #1926 )
...
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-12-16 21:12:08 -08:00
CHU Tianxiang
0fbfc4b81b
Add GPTQ support ( #916 )
2023-12-15 03:04:22 -08:00
Simon Mo
2e8fc0d4c3
Fix completion API echo and logprob combo ( #1992 )
2023-12-10 13:20:30 -08:00
Jin Shang
1aa1361510
Fix OpenAI server completion_tokens referenced before assignment ( #1996 )
2023-12-09 21:01:21 -08:00
Roy
60dc62dc9e
add custom server params ( #1868 )
2023-12-03 12:59:18 -08:00
Simon Mo
5313c2cb8b
Add Production Metrics in Prometheus format ( #1890 )
2023-12-02 16:37:44 -08:00
Adam Brusselback
66785cc05c
Support chat template and echo for chat API ( #1756 )
2023-11-30 16:43:13 -08:00
Michael McCulloch
c782195662
Disable Logs Requests should Disable Logging of requests. ( #1779 )
...
Co-authored-by: Michael McCulloch <mjm.gitlab@fastmail.com>
2023-11-29 21:50:02 -08:00
Yunmo Chen
665cbcec4b
Added echo function to OpenAI API server. ( #1504 )
2023-11-26 21:29:17 -08:00
Simon Mo
5ffc0d13a2
Migrate linter from pylint to ruff ( #1665 )
2023-11-20 11:58:01 -08:00
liuyhwangyh
edb305584b
Support download models from www.modelscope.cn ( #1588 )
2023-11-17 20:38:31 -08:00
Iskren Ivov Chernev
686f5e3210
Return usage for openai streaming requests ( #1663 )
2023-11-16 15:28:36 -08:00
Fluder-Paradyne
7e90a2d117
Add /health Endpoint for both Servers ( #1540 )
2023-11-01 10:29:44 -07:00
Dan Lord
7013a80170
Add support for spaces_between_special_tokens
2023-10-30 16:52:56 -07:00
Yunfeng Bai
09ff7f106a
API server support ipv4 / ipv6 dualstack ( #1288 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-07 15:15:54 -07:00
Antoni Baum
acbed3ef40
Use monotonic time where appropriate ( #1249 )
2023-10-02 19:22:05 -07:00
Federico Cassano
66d18a7fb0
add support for tokenizer revision ( #1163 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-10-02 19:19:46 -07:00
Woosuk Kwon
f936657eb6
Provide default max model length ( #1224 )
2023-09-28 14:44:02 -07:00
Dan Lord
20f7cc4cde
Add skip_special_tokens sampling params ( #1186 )
2023-09-27 19:21:42 -07:00
Wen Sun
bbbf86565f
Align max_tokens behavior with openai ( #852 )
2023-09-23 18:10:13 -07:00
Ricardo Lu
f98b745a81
feat: support stop_token_ids parameter. ( #1097 )
2023-09-21 15:34:02 -07:00
Roy
2d1e86f1b1
clean api code, remove redundant background task. ( #1102 )
2023-09-21 13:25:05 -07:00
Woosuk Kwon
bc0644574c
Add gpu_memory_utilization and swap_space to LLM ( #1090 )
2023-09-19 22:16:04 -07:00
orellavie1212
fbe66e1d0b
added support for quantize on LLM module ( #1080 )
2023-09-18 11:04:21 -07:00
Lukas Kreussel
b5f93d0631
Only fail if logit_bias has actual values ( #1045 )
2023-09-14 17:33:01 -07:00
Jasmond L
ab019eea75
Add Model Revision Support ( #1014 )
...
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-13 15:20:02 -07:00
Antoni Baum
080438477f
Start background task in AsyncLLMEngine.generate ( #988 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-08 00:03:39 -07:00
Antoni Baum
c07ece5ca4
Make AsyncLLMEngine more robust & fix batched abort ( #969 )
...
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>
2023-09-07 13:43:45 -07:00
Antoni Baum
1696725879
Initialize AsyncLLMEngine bg loop correctly ( #943 )
2023-09-04 17:41:22 -07:00
lplcor
becd7a56f1
Enable request body OpenAPI spec for OpenAI endpoints ( #865 )
2023-08-29 21:54:08 -07:00
WanMok
e06f504a76
Supports tokens and arrays of tokens as inputs to the OpenAI completion API ( #715 )
2023-08-11 12:14:34 -07:00
Nicolas Basile
66c54aa9c3
Check the max prompt length for the OpenAI completions API ( #472 )
2023-08-08 17:43:49 -07:00
YHPeter
e8ddc08ec8
[BUG FIX] upgrade fschat version to 0.2.23 ( #650 )
...
Co-authored-by: hao.yu <hao.yu@cn-c017.server.mila.quebec>
2023-08-02 14:05:59 -07:00
Zhuohan Li
58a072be15
[Fix] Add model sequence length into model config ( #575 )
2023-07-25 23:46:30 -07:00
Zhuohan Li
82ad323dee
[Fix] Add chat completion Example and simplify dependencies ( #576 )
2023-07-25 23:45:48 -07:00
Ricardo Lu
8c4b2592fb
fix: enable trust-remote-code in api server & benchmark. ( #509 )
2023-07-19 17:06:15 -07:00
Woosuk Kwon
b6fbb9a565
Sort the outputs before return ( #402 )
2023-07-08 14:48:18 -07:00
codethazine
a945fcc2ae
Add trust-remote-code flag to handle remote tokenizers ( #364 )
2023-07-07 11:04:58 -07:00
Nicolas Frenay
be54f8e5c4
[Fix] Change /generate response-type to json for non-streaming ( #374 )
2023-07-06 18:15:17 -07:00
Ricardo Lu
b396cb4998
fix: only response [DONE] once when streaming response. ( #378 )
2023-07-06 18:08:40 -07:00
akxxsb
3d64cf019e
[Server] use fastchat.model.model_adapter.get_conversation_template method to get model template ( #357 )
2023-07-04 21:39:59 -07:00
Zhuohan Li
98fe8cb542
[Server] Add option to specify chat template for chat endpoint ( #345 )
2023-07-03 23:01:56 -07:00
Zhuohan Li
42e0c1df78
[Quality] Add CI for formatting ( #343 )
2023-07-03 14:50:56 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter ( #326 )
2023-07-03 11:31:55 -07:00
Zhuohan Li
0ffded812a
[Fix] Better error message for batched prompts ( #342 )
2023-07-03 09:27:31 -07:00
Michele Catalano
0bd2a573a5
Allow send list of str for the Prompt on openai demo endpoint /v1/completions ( #323 )
...
* allow str or List[str] for prompt
* Update vllm/entrypoints/openai/api_server.py
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
---------
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-07-03 09:17:50 -07:00
Ricardo Lu
49b26e2cec
feat: add ChatCompletion endpoint in OpenAI demo server. ( #330 )
2023-07-02 22:54:33 -07:00
Woosuk Kwon
998d9d1509
[Tokenizer] Add tokenizer mode ( #298 )
2023-06-28 14:19:22 -07:00