[Docs] Add information about using shared memory in docker (#1845)

This commit is contained in:
Simon Mo 2023-11-29 18:33:56 -08:00 committed by GitHub
parent a9e4574261
commit 0f621c2c7d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 10 additions and 2 deletions

View File

@ -18,7 +18,7 @@ This document provides a high-level guide on integrating a `HuggingFace Transfor
0. Fork the vLLM repository 0. Fork the vLLM repository
-------------------------------- --------------------------------
Start by forking our `GitHub <https://github.com/vllm-project/vllm/>`_ repository and then :ref:`build it from source <build_from_source>`. Start by forking our `GitHub`_ repository and then :ref:`build it from source <build_from_source>`.
This gives you the ability to modify the codebase and test your model. This gives you the ability to modify the codebase and test your model.

View File

@ -11,12 +11,20 @@ The image is available on Docker Hub as `vllm/vllm-openai <https://hub.docker.co
$ docker run --runtime nvidia --gpus all \ $ docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \ -v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \ --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \ vllm/vllm-openai:latest \
--model mistralai/Mistral-7B-v0.1 --model mistralai/Mistral-7B-v0.1
.. note::
You can either use the ``ipc=host`` flag or ``--shm-size`` flag to allow the
container to access the host's shared memory. vLLM uses PyTorch, which uses shared
memory to share data between processes under the hood, particularly for tensor parallel inference.
You can build and run vLLM from source via the provided dockerfile. To build vLLM: You can build and run vLLM from source via the provided dockerfile. To build vLLM:
.. code-block:: console .. code-block:: console