Add docs on serving with Llama Stack (#10183)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>
This commit is contained in:
parent
fe15729a2b
commit
4800339c62
@ -13,3 +13,4 @@ Integrations
|
||||
deploying_with_dstack
|
||||
serving_with_langchain
|
||||
serving_with_llamaindex
|
||||
serving_with_llamastack
|
||||
|
||||
42
docs/source/serving/serving_with_llamastack.rst
Normal file
42
docs/source/serving/serving_with_llamastack.rst
Normal file
@ -0,0 +1,42 @@
|
||||
.. _run_on_llamastack:
|
||||
|
||||
Serving with Llama Stack
|
||||
============================
|
||||
|
||||
vLLM is also available via `Llama Stack <https://github.com/meta-llama/llama-stack>`_ .
|
||||
|
||||
To install Llama Stack, run
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ pip install llama-stack -q
|
||||
|
||||
Inference using OpenAI Compatible API
|
||||
-------------------------------------
|
||||
|
||||
Then start Llama Stack server pointing to your vLLM server with the following configuration:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
inference:
|
||||
- provider_id: vllm0
|
||||
provider_type: remote::vllm
|
||||
config:
|
||||
url: http://127.0.0.1:8000
|
||||
|
||||
Please refer to `this guide <https://github.com/meta-llama/llama-stack/blob/main/docs/source/getting_started/distributions/self_hosted_distro/remote_vllm.md>`_ for more details on this remote vLLM provider.
|
||||
|
||||
Inference via Embedded vLLM
|
||||
---------------------------
|
||||
|
||||
An `inline vLLM provider
|
||||
<https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/inference/vllm>`_
|
||||
is also available. This is a sample of configuration using that method:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
inference
|
||||
- provider_type: vllm
|
||||
config:
|
||||
model: Llama3.1-8B-Instruct
|
||||
tensor_parallel_size: 4
|
||||
Loading…
Reference in New Issue
Block a user