cutlass

History

Gregory Meyer (gregjm) ecbd24566c Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754 ) * Enable shared memory intrinsics and ldmatrix PTX on Clang. This commit adds preprocessor checks to enable the shared memory intrinsics `__cvta_generic_to_shared` and `__nvvm_get_smem_pointer`, as well as the `ldmatrix` PTX instructions, on Clang. Preventing these intrinsics from being used is a significant latency regression on Clang. * refine the macro --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>		2023-03-31 21:42:24 -04:00
..
cluster_sm90.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
copy_sm75.hpp	Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754 )	2023-03-31 21:42:24 -04:00
copy_sm80.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
copy_sm90_desc.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
copy_sm90_tma.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
copy_sm90.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
copy.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
mma_sm61.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
mma_sm70.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
mma_sm75.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
mma_sm80.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
mma_sm90_desc.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
mma_sm90_gmma.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
mma_sm90.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
mma.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
util.hpp	Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754 )	2023-03-31 21:42:24 -04:00