Update README.md

2017-12-05 20:50:15 -05:00 · 2017-12-05 20:50:15 -05:00 · f30abfc00a
commit f30abfc00a
parent 8ebd6b06d0
1 changed files with 14 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -2,13 +2,20 @@
 # Introduction
-CUTLASS is a CUDA C++ template library for implementing matrix-multiply
+CUTLASS is a collection of templated CUDA C++ abstractions for implementing 
-procedures that may be instantiated in CUDA device kernels. CUTLASS applies
+high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA. 
-object-oriented and generic programming techniques to maximize flexibility of
+It incorporates the same stragies for data movemement and hierarchical decomposition 
-the resulting code and facilitate composition with caller-supplied functionality.
+that are used to implement cuBLAS.  CUTLASS decomposes these “moving parts” into 
-CUDA C++ templates are used to specify policy decisions such as block sizes,
+reusabe, modular software components abstracted by C++ template classes.  These
-data types of input and accumulator operands, and element-wise operations applied
+thread-wide, warp-wide, block-wide, and device-wide abstractions can be specialized 
-to the results of matrix multiply.
+by custom tiling sizes, data types, and other algorithmic policy.  This flexibility
 allows them to be used as building blocks within custom kernels and applications.
 To support a wide variety of applications, CUTLASS provides extensive support for
 mixed-precision computations, providing specialized data-movement and 
 multiply-accumulate abstractions for 8-bit integer, half-precision floating 
 point (FP16), single-precision floating point (FP32), and double-precision floating 
 point (FP64) types.
 # Project Structure