vllm/models at 81ede99ca44a5b3518932a07ea4a76a719e7416e - vllm

History

Kuntai Du 81ede99ca4 [Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).		2024-10-17 11:38:15 -05:00
..
adding_model.rst	[Misc] Collect model support info in a single process per model (#9233 )	2024-10-11 11:08:11 +00:00
enabling_multimodal_inputs.rst	[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126 )	2024-08-14 17:55:42 +00:00
engine_args.rst	[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431 )	2024-07-17 07:43:21 +00:00
lora.rst	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
performance.rst	[Doc] Compatibility matrix for mutual exclusive features (#8512 )	2024-10-11 11:18:50 -07:00
spec_decode.rst	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 )	2024-10-17 11:38:15 -05:00
supported_models.rst	[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396 )	2024-10-16 16:40:24 +00:00
vlm.rst	[Misc] Consolidate example usage of OpenAI client for multimodal models (#9412 )	2024-10-16 11:20:51 +00:00