Update README.md

This commit is contained in:
Duane Merrill 2017-12-05 20:50:15 -05:00 committed by GitHub
parent 8ebd6b06d0
commit f30abfc00a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -2,13 +2,20 @@
# Introduction # Introduction
CUTLASS is a CUDA C++ template library for implementing matrix-multiply CUTLASS is a collection of templated CUDA C++ abstractions for implementing
procedures that may be instantiated in CUDA device kernels. CUTLASS applies high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA.
object-oriented and generic programming techniques to maximize flexibility of It incorporates the same stragies for data movemement and hierarchical decomposition
the resulting code and facilitate composition with caller-supplied functionality. that are used to implement cuBLAS. CUTLASS decomposes these “moving parts” into
CUDA C++ templates are used to specify policy decisions such as block sizes, reusabe, modular software components abstracted by C++ template classes. These
data types of input and accumulator operands, and element-wise operations applied thread-wide, warp-wide, block-wide, and device-wide abstractions can be specialized
to the results of matrix multiply. by custom tiling sizes, data types, and other algorithmic policy. This flexibility
allows them to be used as building blocks within custom kernels and applications.
To support a wide variety of applications, CUTLASS provides extensive support for
mixed-precision computations, providing specialized data-movement and
multiply-accumulate abstractions for 8-bit integer, half-precision floating
point (FP16), single-precision floating point (FP32), and double-precision floating
point (FP64) types.
# Project Structure # Project Structure