cutlass

History

Gregory Meyer (gregjm) ecbd24566c Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754 ) * Enable shared memory intrinsics and ldmatrix PTX on Clang. This commit adds preprocessor checks to enable the shared memory intrinsics `__cvta_generic_to_shared` and `__nvvm_get_smem_pointer`, as well as the `ldmatrix` PTX instructions, on Clang. Preventing these intrinsics from being used is a significant latency regression on Clang. * refine the macro --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>		2023-03-31 21:42:24 -04:00
..
algorithm	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
arch	Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754 )	2023-03-31 21:42:24 -04:00
atom	CUTLASS 3.0 Hopper GEMMs are GETTs in disguise (#897 )	2023-03-29 10:42:40 -04:00
container	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
numeric	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
util	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
config.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
int_tuple.hpp	Fix 8.4 + CUDA 11.4 build (#789 )	2023-01-27 09:18:59 -05:00
layout.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
pointer.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
stride.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
swizzle_layout.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
swizzle_ptr.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
swizzle.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
tensor_predicate.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
tensor.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
tile.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00
underscore.hpp	CUTLASS 3.0.0 (#786 )	2023-01-23 20:55:28 -05:00