vllm/docs/source/serving
Joe Runde 031a7995f3
[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-11-01 01:09:46 +00:00
..
compatibility_matrix.rst [Bugfix][Frontend] Reject guided decoding in multistep mode (#9892) 2024-11-01 01:09:46 +00:00
deploying_with_bentoml.rst docs: Add BentoML deployment doc (#3336) 2024-03-12 10:34:30 -07:00
deploying_with_cerebrium.rst [DOC] - Add docker image to Cerebrium Integration (#6510) 2024-07-17 10:22:53 -07:00
deploying_with_docker.rst [Doc] Update docker references (#5614) 2024-06-19 15:01:45 -07:00
deploying_with_dstack.rst [Doc][CI/Build] Update docs and tests to use vllm serve (#6431) 2024-07-17 07:43:21 +00:00
deploying_with_k8s.rst [Doc]: Add deploying_with_k8s guide (#8451) 2024-10-07 13:31:45 -07:00
deploying_with_kserve.rst Update link to KServe deployment guide (#9173) 2024-10-09 03:58:49 +00:00
deploying_with_lws.rst Support to serve vLLM on Kubernetes with LWS (#4829) 2024-05-16 16:37:29 -07:00
deploying_with_nginx.rst [Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212) 2024-10-22 10:38:04 -07:00
deploying_with_triton.rst Add documentation to Triton server tutorial (#983) 2023-09-20 10:32:40 -07:00
distributed_serving.rst [doc] update pp support (#9853) 2024-10-30 13:36:51 -07:00
env_vars.rst [doc][misc] add note for Kubernetes users (#5916) 2024-06-27 10:07:07 -07:00
faq.rst [Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962) 2024-09-05 16:25:29 -04:00
integrations.rst llama_index serving integration documentation (#6973) 2024-08-14 15:38:37 -07:00
metrics.rst Add Production Metrics in Prometheus format (#1890) 2023-12-02 16:37:44 -08:00
openai_compatible_server.md [Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339) 2024-10-29 15:07:37 -07:00
run_on_sky.rst [Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837) 2024-10-30 18:15:56 -07:00
serving_with_langchain.rst docs: fix langchain (#2736) 2024-02-03 18:17:55 -08:00
serving_with_llamaindex.rst llama_index serving integration documentation (#6973) 2024-08-14 15:38:37 -07:00
tensorizer.rst [Doc]: Update tensorizer docs to include vllm[tensorizer] (#7889) 2024-10-22 15:43:25 -07:00
usage_stats.md Usage Stats Collection (#2852) 2024-03-28 22:16:12 -07:00