vllm/csrc/quantization/fp8
Varun Sundar Rabindranath b5241e41d9
[ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-07-18 01:38:35 +00:00
..
amd [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
nvidia [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
common.cu [ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511) 2024-07-18 01:38:35 +00:00
fp8_marlin.cu [Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975) 2024-07-03 17:38:00 +00:00