vllm/fp8 at 2e26564259801dc6359c49bb044104d8d5373b57 - vllm

History

Varun Sundar Rabindranath 2e26564259 [ Kernel ] FP8 Dynamic Per Token Quant - Add scale_ub (#6593 ) Co-authored-by: Varun Sundar Rabindranth <varun@neuralmagic.com>		2024-07-19 18:15:26 -07:00
..
amd	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
nvidia	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
common.cu	[ Kernel ] FP8 Dynamic Per Token Quant - Add scale_ub (#6593 )	2024-07-19 18:15:26 -07:00
fp8_marlin.cu	[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975 )	2024-07-03 17:38:00 +00:00