
CUTLASS 2.0 Substantially refactored for - Better performance, particularly for native Turing Tensor Cores - Robust and durable templates spanning the design space - Encapsulated functionality embodying modern C++11 programming techniques - Optimized containers and data types for efficient, generic, portable device code Updates to: - Quick start guide - Documentation - Utilities - CUTLASS Profiler Native Turing Tensor Cores - Efficient GEMM kernels targeting Turing Tensor Cores - Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands Coverage of existing CUTLASS functionality: - GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs - Volta Tensor Cores through native mma.sync and through WMMA API - Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions - Batched GEMM operations - Complex-valued GEMMs Note: this commit and all that follow require a host compiler supporting C++11 or greater.
57 lines
903 B
Markdown
57 lines
903 B
Markdown

|
|
|
|
[README](/README.md#documentation) > **Contributors**
|
|
|
|
# CUTLASS Developers and Contributors
|
|
|
|
This is the official list of CUTLASS developers and contributors.
|
|
|
|
## DEVELOPERS
|
|
Andrew Kerr
|
|
Haicheng Wu
|
|
Naila Farooqui
|
|
Dustyn Blasig
|
|
Pradeep Ramani
|
|
Manish Gupta
|
|
Aditya Atluri
|
|
Paul Springer
|
|
David Tanner
|
|
Scott Yokim
|
|
Jin Wang
|
|
|
|
## CONTRIBUTORS
|
|
Timothy Costa
|
|
Julien Demouth
|
|
Brian Fahs
|
|
Michael Goldfarb
|
|
Mostafa Hagog
|
|
Markus Hohnerbach
|
|
Fei Hu
|
|
Alan Kaatz
|
|
Tina Li
|
|
Timmy Liu
|
|
Duane Merrill
|
|
Kevin Siu
|
|
Markus Tavenrath
|
|
John Tran
|
|
Vicki Wang
|
|
Junkai Wu
|
|
Fung Xie
|
|
Albert Xu
|
|
Jack Yang
|
|
Xiuxia Zhang
|
|
Nick Zhao
|
|
|
|
## ACKNOWLEDGEMENTS
|
|
|
|
Girish Bharambe
|
|
Cris Cecka
|
|
Luke Durant
|
|
Olivier Giroux
|
|
Stephen Jones
|
|
Rishkul Kulkarni
|
|
Bryce Lelbach
|
|
Joel McCormack
|
|
Kyrylo Perelygin
|
|
|