Update README.md

2017-12-05 22:54:49 -05:00 · 2017-12-05 22:54:49 -05:00 · 537a4bcedf
commit 537a4bcedf
parent 5bd3f09312
1 changed files with 2 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -27,7 +27,8 @@ in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-

 ![ALT](/media/cutlass-performance-plot.png "Relative performance of CUTLASS and cuBLAS for large matrices")

-CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM 
+CUTLASS primitives are very efficient.  When used to construct device-wide GEMM kernels, 
+they exhibit performance comparable to cuBLAS for scalar GEMM 
 computations. The above figure shows CUTLASS performance relative to cuBLAS 
 for large matrix dimensions (M=10240, N=K=4096) running on an NVIDIA Tesla V100 GPU 
 when compiled with CUDA 9.0.