From b0925b38789bb3b20dcc39e229fcfe12a311e487 Mon Sep 17 00:00:00 2001 From: Sherlock Xu <65327072+Sherlock113@users.noreply.github.com> Date: Wed, 13 Mar 2024 01:34:30 +0800 Subject: [PATCH] docs: Add BentoML deployment doc (#3336) Signed-off-by: Sherlock113 --- docs/source/index.rst | 1 + docs/source/serving/deploying_with_bentoml.rst | 8 ++++++++ 2 files changed, 9 insertions(+) create mode 100644 docs/source/serving/deploying_with_bentoml.rst diff --git a/docs/source/index.rst b/docs/source/index.rst index c0250bf9..65bfbbab 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -73,6 +73,7 @@ Documentation serving/run_on_sky serving/deploying_with_kserve serving/deploying_with_triton + serving/deploying_with_bentoml serving/deploying_with_docker serving/serving_with_langchain serving/metrics diff --git a/docs/source/serving/deploying_with_bentoml.rst b/docs/source/serving/deploying_with_bentoml.rst new file mode 100644 index 00000000..4b9d19f5 --- /dev/null +++ b/docs/source/serving/deploying_with_bentoml.rst @@ -0,0 +1,8 @@ +.. _deploying_with_bentoml: + +Deploying with BentoML +====================== + +`BentoML `_ allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. You can serve the model locally or containerize it as an OCI-complicant image and deploy it on Kubernetes. + +For details, see the tutorial `vLLM inference in the BentoML documentation `_. \ No newline at end of file