cutlass

Author	SHA1	Message	Date
Andrew Kerr	fb335f6a5f	CUTLASS 2.0 (#62 ) CUTLASS 2.0 Substantially refactored for - Better performance, particularly for native Turing Tensor Cores - Robust and durable templates spanning the design space - Encapsulated functionality embodying modern C++11 programming techniques - Optimized containers and data types for efficient, generic, portable device code Updates to: - Quick start guide - Documentation - Utilities - CUTLASS Profiler Native Turing Tensor Cores - Efficient GEMM kernels targeting Turing Tensor Cores - Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands Coverage of existing CUTLASS functionality: - GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs - Volta Tensor Cores through native mma.sync and through WMMA API - Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions - Batched GEMM operations - Complex-valued GEMMs Note: this commit and all that follow require a host compiler supporting C++11 or greater.	2019-11-19 16:55:34 -08:00
Artem Belevich	e18292db46	Make CUTLASS compileable with Clang. Requires a recent clang build (r359248 or newer). Enable compilation with clang with these options: cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=/path/to/clang++	2019-05-02 11:00:22 -07:00
Andrew Kerr	877bdcace6	Cutlass 1.3 Release (#42 ) CUTLASS 1.3 Release - Efficient GEMM kernel targeting Volta Tensor Cores via mma.sync instruction added in CUDA 10.1.	2019-03-20 10:49:17 -07:00
akerr	74df0331f2	CUTLASS 1.2	2018-10-26 14:38:46 -07:00
akerr	461f417b9d	Checkpointing CUTLASS 1.1 release.	2018-09-18 16:58:03 -07:00
akerr	2028ebe120	CUTLASS v1.0 release	2018-05-16 11:44:56 -07:00

6 Commits