Fix cuTE compilation with clang (#939)

- clang 1.14 complains about missing function from a host call:
  cutlass/include/cute/arch/util.hpp:106:32: error: no matching function for call to '__cvta_generic_to_shared'
  return static_cast<uint32_t>(__cvta_generic_to_shared(ptr));
- fixes this by defining CUTE_HOST_DEVICE for clang as well

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
This commit is contained in:
Janusz Lisiecki 2023-05-09 15:51:45 +02:00 committed by GitHub
parent 7c04f95415
commit 24c8b7d8a2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -30,7 +30,7 @@
**************************************************************************************************/
#pragma once
#if defined(__CUDA_ARCH__) || defined(_NVHPC_CUDA)
#if defined(__CUDA_ARCH__) || defined(_NVHPC_CUDA) || defined(__clang__)
# define CUTE_HOST_DEVICE __forceinline__ __host__ __device__
# define CUTE_DEVICE __forceinline__ __device__
# define CUTE_HOST __forceinline__ __host__