cutlass

Author	SHA1	Message	Date
Pradeep Ramani	c008b4aea8	CUTLASS 3.3.0 (#1167 ) * Release 3.3.0 Adds support for mixed precision GEMMs On Hopper and Ampere Adds support for < 16B aligned GEMMs on Hopper Enhancements to EVT Enhancements to Python interface Enhancements to Sub-byte type handling in CuTe Several other bug-fixes and performance improvements. * minor doc update	2023-11-02 11:09:05 -04:00
ANIKET SHIVAM	90d3b0fb18	CUTLASS 3.2.1 (#1113 ) * Updates for 3.2.1 release. * Minor fix in gemm op profiler for raster order. * Add scheduler mapping for raster order in the kernels.	2023-09-26 17:24:26 -04:00
ANIKET SHIVAM	4575443d44	CUTLASS 3.2 (#1024 ) * CUTLASS 3.2	2023-08-07 20:50:32 -04:00
ANIKET SHIVAM	d572cc1aab	CUTLASS 3.1 (#915 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-04-14 23:19:34 -04:00
Mark Hoemmen	add4ba622f	Fix 8.4 + CUDA 11.4 build (#789 ) Work around a likely GCC 8.x issue with fold expressions and generic lambdas. Only use the work-around when the host compiler is GCC 8.x. This avoids any concerns about the work-around possibly hindering inlining for a critical CuTe function (product). Users can experiment with the work-around for other compilers or compiler versions by defining the following macro. CUTE_FOLD_GENERIC_LAMBDA_WORKAROUND Fixes https://github.com/NVIDIA/cutlass/issues/788 Co-authored-by: Mark Hoemmen <mhoemmen@nvidia.com>	2023-01-27 09:18:59 -05:00
Vijay Thakkar	277bd6e537	CUTLASS 3.0.0 (#786 ) * CUTLASS 3.0.0	2023-01-23 20:55:28 -05:00