Add small example to metrics.rst (#10550)
This commit is contained in:
parent
46fe9b46d8
commit
9afa014552
@ -2,9 +2,34 @@ Production Metrics
|
||||
==================
|
||||
|
||||
vLLM exposes a number of metrics that can be used to monitor the health of the
|
||||
system. These metrics are exposed via the `/metrics` endpoint on the vLLM
|
||||
system. These metrics are exposed via the ``/metrics`` endpoint on the vLLM
|
||||
OpenAI compatible API server.
|
||||
|
||||
You can start the server using Python, or using [Docker](deploying_with_docker.rst):
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ vllm serve unsloth/Llama-3.2-1B-Instruct
|
||||
|
||||
Then query the endpoint to get the latest metrics from the server:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ curl http://0.0.0.0:8000/metrics
|
||||
|
||||
# HELP vllm:iteration_tokens_total Histogram of number of tokens per engine_step.
|
||||
# TYPE vllm:iteration_tokens_total histogram
|
||||
vllm:iteration_tokens_total_sum{model_name="unsloth/Llama-3.2-1B-Instruct"} 0.0
|
||||
vllm:iteration_tokens_total_bucket{le="1.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
|
||||
vllm:iteration_tokens_total_bucket{le="8.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
|
||||
vllm:iteration_tokens_total_bucket{le="16.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
|
||||
vllm:iteration_tokens_total_bucket{le="32.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
|
||||
vllm:iteration_tokens_total_bucket{le="64.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
|
||||
vllm:iteration_tokens_total_bucket{le="128.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
|
||||
vllm:iteration_tokens_total_bucket{le="256.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
|
||||
vllm:iteration_tokens_total_bucket{le="512.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
|
||||
...
|
||||
|
||||
The following metrics are exposed:
|
||||
|
||||
.. literalinclude:: ../../../vllm/engine/metrics.py
|
||||
|
||||
Loading…
Reference in New Issue
Block a user