diff --git a/docs/source/models/adding_model.rst b/docs/source/models/adding_model.rst index 27b36994..a18ae7ef 100644 --- a/docs/source/models/adding_model.rst +++ b/docs/source/models/adding_model.rst @@ -18,7 +18,7 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor 0. Fork the vLLM repository -------------------------------- -Start by forking our `GitHub `_ repository and then :ref:`build it from source `. +Start by forking our `GitHub`_ repository and then :ref:`build it from source `. This gives you the ability to modify the codebase and test your model. diff --git a/docs/source/serving/deploying_with_docker.rst b/docs/source/serving/deploying_with_docker.rst index 58fadc25..e1daecc5 100644 --- a/docs/source/serving/deploying_with_docker.rst +++ b/docs/source/serving/deploying_with_docker.rst @@ -11,12 +11,20 @@ The image is available on Docker Hub as `vllm/vllm-openai " \ + -p 8000:8000 \ + --ipc=host \ vllm/vllm-openai:latest \ --model mistralai/Mistral-7B-v0.1 +.. note:: + + You can either use the ``ipc=host`` flag or ``--shm-size`` flag to allow the + container to access the host's shared memory. vLLM uses PyTorch, which uses shared + memory to share data between processes under the hood, particularly for tensor parallel inference. + + You can build and run vLLM from source via the provided dockerfile. To build vLLM: .. code-block:: console