Update README.md

This commit is contained in:
Duane Merrill 2017-12-05 22:44:01 -05:00 committed by GitHub
parent 0428c89fd5
commit 6f091f5620
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -20,7 +20,7 @@ point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe
the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture
and beyond. and beyond.
![ALT](/media/fig-09-complete-hierarchy.png "Relative performance of CUTLASS and cuBLAS for large matrices") ![ALT](/media/cutlass-performance-plot.png "Relative performance of CUTLASS and cuBLAS for large matrices")
CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM
computations. The above figure shows CUTLASS performance relative to cuBLAS computations. The above figure shows CUTLASS performance relative to cuBLAS