From 78107fa0911567f131cbad810872ae25594a4506 Mon Sep 17 00:00:00 2001 From: Sean Gallen Date: Thu, 4 Apr 2024 23:52:01 -0500 Subject: [PATCH] [Doc]Add asynchronous engine arguments to documentation. (#3810) Co-authored-by: Simon Mo Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> --- docs/source/models/engine_args.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/source/models/engine_args.rst b/docs/source/models/engine_args.rst index 9f5f672a..d8a7ac72 100644 --- a/docs/source/models/engine_args.rst +++ b/docs/source/models/engine_args.rst @@ -118,3 +118,19 @@ Below, you can find an explanation of every engine argument for vLLM: .. option:: --quantization (-q) {awq,squeezellm,None} Method used to quantize the weights. + +Async Engine Arguments +---------------------- +Below are the additional arguments related to the asynchronous engine: + +.. option:: --engine-use-ray + + Use Ray to start the LLM engine in a separate process as the server process. + +.. option:: --disable-log-requests + + Disable logging requests. + +.. option:: --max-log-len + + Max number of prompt characters or prompt ID numbers being printed in log. Defaults to unlimited. \ No newline at end of file