Update README.md
This commit is contained in:
parent
6565b48747
commit
dd4dd4cebf
@ -4,8 +4,8 @@
|
|||||||
|
|
||||||
CUTLASS is a collection of CUDA C++ template abstractions for implementing
|
CUTLASS is a collection of CUDA C++ template abstractions for implementing
|
||||||
high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA.
|
high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA.
|
||||||
It incorporates the same strategies for hierarchical decomposition and data movement
|
It incorporates strategies for hierarchical decomposition and data movement similar
|
||||||
that are used to implement cuBLAS. CUTLASS decomposes these “moving parts” into
|
to those used to implement cuBLAS. CUTLASS decomposes these “moving parts” into
|
||||||
reusable, modular software components abstracted by C++ template classes. These
|
reusable, modular software components abstracted by C++ template classes. These
|
||||||
thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized
|
thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized
|
||||||
and tuned via custom tiling sizes, data types, and other algorithmic policy. The
|
and tuned via custom tiling sizes, data types, and other algorithmic policy. The
|
||||||
|
Loading…
Reference in New Issue
Block a user