vllm/source at ce4f5a29fb3e35041842518fefe999847b8326b9 - vllm

History

Sage Moore ce4f5a29fb Add Automatic Prefix Caching (#2762 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>		2024-03-02 00:50:01 -08:00
..
assets/logos	Update README.md (#1292 )	2023-10-08 23:15:50 -07:00
dev/engine	[DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011 )	2024-01-11 19:26:49 -08:00
getting_started	[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768 )	2024-02-10 23:14:37 -08:00
models	Add Automatic Prefix Caching (#2762 )	2024-03-02 00:50:01 -08:00
quantization	[CI] Ensure documentation build is checked in CI (#2842 )	2024-02-12 22:53:07 -08:00
serving	docs: Add tutorial on deploying vLLM model with KServe (#2586 )	2024-03-01 11:04:14 -08:00
conf.py	Port metrics from `aioprometheus` to `prometheus_client` (#2730 )	2024-02-25 11:54:00 -08:00
index.rst	docs: Add tutorial on deploying vLLM model with KServe (#2586 )	2024-03-01 11:04:14 -08:00