Update README.md

This commit is contained in:
Duane Merrill 2017-12-05 22:53:11 -05:00 committed by GitHub
parent 6f091f5620
commit 5bd3f09312
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -20,15 +20,17 @@ point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe
the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture
and beyond. and beyond.
For more exposition, see our Parallel Forall blog post ["CUTLASS: Fast Linear Algebra
in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-cuda).
# Performance
![ALT](/media/cutlass-performance-plot.png "Relative performance of CUTLASS and cuBLAS for large matrices") ![ALT](/media/cutlass-performance-plot.png "Relative performance of CUTLASS and cuBLAS for large matrices")
CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM
computations. The above figure shows CUTLASS performance relative to cuBLAS computations. The above figure shows CUTLASS performance relative to cuBLAS
compiled with CUDA 9.0 running on an NVIDIA Tesla V100 GPU for large matrix for large matrix dimensions (M=10240, N=K=4096) running on an NVIDIA Tesla V100 GPU
dimensions (M=10240, N=K=4096). when compiled with CUDA 9.0.
For more exposition, see our Parallel Forall blog post ["CUTLASS: Fast Linear Algebra
in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-cuda).
# Project Structure # Project Structure