cutlass/include/cute
Gregory Meyer (gregjm) ecbd24566c
Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754)
* Enable shared memory intrinsics and ldmatrix PTX on Clang.

This commit adds preprocessor checks to enable the shared memory
intrinsics `__cvta_generic_to_shared` and `__nvvm_get_smem_pointer`, as
well as the `ldmatrix` PTX instructions, on Clang. Preventing these
intrinsics from being used is a significant latency regression on Clang.

* refine the macro

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-03-31 21:42:24 -04:00
..
algorithm CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
arch Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754) 2023-03-31 21:42:24 -04:00
atom CUTLASS 3.0 Hopper GEMMs are GETTs in disguise (#897) 2023-03-29 10:42:40 -04:00
container CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
numeric CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
util CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
config.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
int_tuple.hpp Fix 8.4 + CUDA 11.4 build (#789) 2023-01-27 09:18:59 -05:00
layout.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
pointer.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
stride.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
swizzle_layout.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
swizzle_ptr.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
swizzle.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
tensor_predicate.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
tensor.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
tile.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00
underscore.hpp CUTLASS 3.0.0 (#786) 2023-01-23 20:55:28 -05:00