[Doc] Update documentation on Tensorizer (#5471)
This commit is contained in:
parent
cdab68dcdb
commit
6e2527a7cb
@ -81,6 +81,7 @@ Documentation
|
|||||||
serving/env_vars
|
serving/env_vars
|
||||||
serving/usage_stats
|
serving/usage_stats
|
||||||
serving/integrations
|
serving/integrations
|
||||||
|
serving/tensorizer
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|||||||
12
docs/source/serving/tensorizer.rst
Normal file
12
docs/source/serving/tensorizer.rst
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
.. _tensorizer:
|
||||||
|
|
||||||
|
Loading Models with CoreWeave's Tensorizer
|
||||||
|
==========================================
|
||||||
|
vLLM supports loading models with `CoreWeave's Tensorizer <https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer>`_.
|
||||||
|
vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
|
||||||
|
at runtime extremely quickly directly to the GPU, resulting in significantly
|
||||||
|
shorter Pod startup times and CPU memory usage. Tensor encryption is also supported.
|
||||||
|
|
||||||
|
For more information on CoreWeave's Tensorizer, please refer to
|
||||||
|
`CoreWeave's Tensorizer documentation <https://github.com/coreweave/tensorizer>`_. For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
|
||||||
|
the `vLLM example script <https://docs.vllm.ai/en/stable/getting_started/examples/tensorize_vllm_model.html>`_.
|
||||||
@ -230,7 +230,7 @@ class EngineArgs:
|
|||||||
'* "dummy" will initialize the weights with random values, '
|
'* "dummy" will initialize the weights with random values, '
|
||||||
'which is mainly for profiling.\n'
|
'which is mainly for profiling.\n'
|
||||||
'* "tensorizer" will load the weights using tensorizer from '
|
'* "tensorizer" will load the weights using tensorizer from '
|
||||||
'CoreWeave. See the Tensorize vLLM Model script in the Examples'
|
'CoreWeave. See the Tensorize vLLM Model script in the Examples '
|
||||||
'section for more information.\n'
|
'section for more information.\n'
|
||||||
'* "bitsandbytes" will load the weights using bitsandbytes '
|
'* "bitsandbytes" will load the weights using bitsandbytes '
|
||||||
'quantization.\n')
|
'quantization.\n')
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user