From 6f2dd6c37e984dd254d263007b4be0b60964630c Mon Sep 17 00:00:00 2001 From: Tanmay Verma Date: Wed, 20 Sep 2023 10:32:40 -0700 Subject: [PATCH] Add documentation to Triton server tutorial (#983) --- docs/source/index.rst | 1 + docs/source/serving/deploying_with_triton.rst | 6 ++++++ 2 files changed, 7 insertions(+) create mode 100644 docs/source/serving/deploying_with_triton.rst diff --git a/docs/source/index.rst b/docs/source/index.rst index e6d0bc67..f2131cd8 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -64,6 +64,7 @@ Documentation serving/distributed_serving serving/run_on_sky + serving/deploying_with_triton .. toctree:: :maxdepth: 1 diff --git a/docs/source/serving/deploying_with_triton.rst b/docs/source/serving/deploying_with_triton.rst new file mode 100644 index 00000000..5ce7c3d0 --- /dev/null +++ b/docs/source/serving/deploying_with_triton.rst @@ -0,0 +1,6 @@ +.. _deploying_with_triton: + +Deploying with NVIDIA Triton +============================ + +The `Triton Inference Server `_ hosts a tutorial demonstrating how to quickly deploy a simple `facebook/opt-125m `_ model using vLLM. Please see `Deploying a vLLM model in Triton `_ for more details.