Commit Graph

5 Commits

Author SHA1 Message Date
ANIKET SHIVAM
7c04f95415
Updates for 3.1 (#932) 2023-04-29 09:34:27 -04:00
Jakub Szuppe
180c5629bf
Add missing checks for NVRTC in CuTe (#921) 2023-04-25 12:52:43 -04:00
ANIKET SHIVAM
d572cc1aab
CUTLASS 3.1 (#915)
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-04-14 23:19:34 -04:00
Gregory Meyer (gregjm)
ecbd24566c
Enable shared memory intrinsics and ldmatrix PTX on Clang. (#754)
* Enable shared memory intrinsics and ldmatrix PTX on Clang.

This commit adds preprocessor checks to enable the shared memory
intrinsics `__cvta_generic_to_shared` and `__nvvm_get_smem_pointer`, as
well as the `ldmatrix` PTX instructions, on Clang. Preventing these
intrinsics from being used is a significant latency regression on Clang.

* refine the macro

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-03-31 21:42:24 -04:00
Vijay Thakkar
277bd6e537
CUTLASS 3.0.0 (#786)
* CUTLASS 3.0.0
2023-01-23 20:55:28 -05:00