vllm/benchmarks/kernels
Cody Yu a3a73ab069
[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)
The 2nd PR for #4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
2024-05-22 13:28:20 -07:00
..
benchmark_aqlm.py [Core]refactor aqlm quant ops (#4351) 2024-04-25 15:03:56 -04:00
benchmark_marlin.py Add marlin unit tests and marlin benchmark script (#4815) 2024-05-16 09:36:49 -04:00
benchmark_mixtral_moe.py [Kernel] Update fused_moe tuning script for FP8 (#4457) 2024-05-01 11:47:38 -07:00
benchmark_paged_attention.py [Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893) 2024-05-22 13:28:20 -07:00
benchmark_rope.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
benchmark_shapes.py Add marlin unit tests and marlin benchmark script (#4815) 2024-05-16 09:36:49 -04:00