[Docs] Add information about using shared memory in docker (#1845)

2023-11-29 18:33:56 -08:00 · 2023-11-29 18:33:56 -08:00 · 0f621c2c7d
commit 0f621c2c7d
parent a9e4574261
2 changed files with 10 additions and 2 deletions
--- a/docs/source/models/adding_model.rst
+++ b/docs/source/models/adding_model.rst
@ -18,7 +18,7 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
 0. Fork the vLLM repository
 --------------------------------

-Start by forking our `GitHub <https://github.com/vllm-project/vllm/>`_ repository and then :ref:`build it from source <build_from_source>`.
+Start by forking our `GitHub`_ repository and then :ref:`build it from source <build_from_source>`.
 This gives you the ability to modify the codebase and test your model.


--- a/docs/source/serving/deploying_with_docker.rst
+++ b/docs/source/serving/deploying_with_docker.rst
@ -11,12 +11,20 @@ The image is available on Docker Hub as `vllm/vllm-openai <https://hub.docker.co

    $ docker run --runtime nvidia --gpus all \
        -v ~/.cache/huggingface:/root/.cache/huggingface \
-        -p 8000:8000 \
        --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
+        -p 8000:8000 \
+        --ipc=host \
        vllm/vllm-openai:latest \
        --model mistralai/Mistral-7B-v0.1


+.. note::
+
+        You can either use the ``ipc=host`` flag or ``--shm-size`` flag to allow the
+        container to access the host's shared memory. vLLM uses PyTorch, which uses shared
+        memory to share data between processes under the hood, particularly for tensor parallel inference.
+
+
 You can build and run vLLM from source via the provided dockerfile. To build vLLM:

 .. code-block:: console