vllm/docs/source/models
Kuntai Du 81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
..
adding_model.rst [Misc] Collect model support info in a single process per model (#9233) 2024-10-11 11:08:11 +00:00
enabling_multimodal_inputs.rst [VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126) 2024-08-14 17:55:42 +00:00
engine_args.rst [Doc][CI/Build] Update docs and tests to use vllm serve (#6431) 2024-07-17 07:43:21 +00:00
lora.rst [Core] Support Lora lineage and base model metadata management (#6315) 2024-09-20 06:20:56 +00:00
performance.rst [Doc] Compatibility matrix for mutual exclusive features (#8512) 2024-10-11 11:18:50 -07:00
spec_decode.rst [Core] Deprecating block manager v1 and make block manager v2 default (#8704) 2024-10-17 11:38:15 -05:00
supported_models.rst [Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396) 2024-10-16 16:40:24 +00:00
vlm.rst [Misc] Consolidate example usage of OpenAI client for multimodal models (#9412) 2024-10-16 11:20:51 +00:00