From 9dcb2b4c7d32a927ac8cf4fd69410ce603d27367 Mon Sep 17 00:00:00 2001 From: Duane Merrill Date: Tue, 5 Dec 2017 20:55:03 -0500 Subject: [PATCH] Update README.md --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index f4e21c95..2fc643cb 100644 --- a/README.md +++ b/README.md @@ -2,14 +2,15 @@ # Introduction -CUTLASS is a collection of templated CUDA C++ abstractions for implementing +CUTLASS is a collection of CUDA C++ template abstractions for implementing high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA. -It incorporates the same stragies for data movemement and hierarchical decomposition +It incorporates the same strategies for data movement and hierarchical decomposition that are used to implement cuBLAS. CUTLASS decomposes these “moving parts” into -reusabe, modular software components abstracted by C++ template classes. These -thread-wide, warp-wide, block-wide, and device-wide abstractions can be specialized -by custom tiling sizes, data types, and other algorithmic policy. This flexibility -allows them to be used as building blocks within custom kernels and applications. +reusable, modular software components abstracted by C++ template classes. These +thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized +and tuned via custom tiling sizes, data types, and other algorithmic policy. +The resulting flexibility simplifies their use as building blocks within custom +kernels and applications. To support a wide variety of applications, CUTLASS provides extensive support for mixed-precision computations, providing specialized data-movement and