vllm/csrc/quantization/fp8
Philipp Moritz 12628d3c78
[Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales (#4343)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-27 04:49:59 +00:00
..
amd_detail Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290) 2024-04-03 14:15:55 -07:00
fp8_cuda_kernels.cu [Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales (#4343) 2024-04-27 04:49:59 +00:00