Update README.md
This commit is contained in:
parent
8ebd6b06d0
commit
f30abfc00a
21
README.md
21
README.md
@ -2,13 +2,20 @@
|
|||||||
|
|
||||||
# Introduction
|
# Introduction
|
||||||
|
|
||||||
CUTLASS is a CUDA C++ template library for implementing matrix-multiply
|
CUTLASS is a collection of templated CUDA C++ abstractions for implementing
|
||||||
procedures that may be instantiated in CUDA device kernels. CUTLASS applies
|
high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA.
|
||||||
object-oriented and generic programming techniques to maximize flexibility of
|
It incorporates the same stragies for data movemement and hierarchical decomposition
|
||||||
the resulting code and facilitate composition with caller-supplied functionality.
|
that are used to implement cuBLAS. CUTLASS decomposes these “moving parts” into
|
||||||
CUDA C++ templates are used to specify policy decisions such as block sizes,
|
reusabe, modular software components abstracted by C++ template classes. These
|
||||||
data types of input and accumulator operands, and element-wise operations applied
|
thread-wide, warp-wide, block-wide, and device-wide abstractions can be specialized
|
||||||
to the results of matrix multiply.
|
by custom tiling sizes, data types, and other algorithmic policy. This flexibility
|
||||||
|
allows them to be used as building blocks within custom kernels and applications.
|
||||||
|
|
||||||
|
To support a wide variety of applications, CUTLASS provides extensive support for
|
||||||
|
mixed-precision computations, providing specialized data-movement and
|
||||||
|
multiply-accumulate abstractions for 8-bit integer, half-precision floating
|
||||||
|
point (FP16), single-precision floating point (FP32), and double-precision floating
|
||||||
|
point (FP64) types.
|
||||||
|
|
||||||
# Project Structure
|
# Project Structure
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user