diff --git a/README.md b/README.md index 13f6986b..6dc4200e 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA. -It incorporates the same strategies for data movement and hierarchical decomposition +It incorporates the same strategies for hierarchical decomposition and data movement that are used to implement cuBLAS. CUTLASS decomposes these “moving parts” into reusable, modular software components abstracted by C++ template classes. These thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized