vllm/csrc/quantization/fp8
Varun Sundar Rabindranath 2e26564259
[ Kernel ] FP8 Dynamic Per Token Quant - Add scale_ub (#6593)
Co-authored-by: Varun Sundar Rabindranth <varun@neuralmagic.com>
2024-07-19 18:15:26 -07:00
..
amd [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
nvidia [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
common.cu [ Kernel ] FP8 Dynamic Per Token Quant - Add scale_ub (#6593) 2024-07-19 18:15:26 -07:00
fp8_marlin.cu [Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975) 2024-07-03 17:38:00 +00:00