Update README.md
This commit is contained in:
parent
0428c89fd5
commit
6f091f5620
@ -20,7 +20,7 @@ point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe
|
|||||||
the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture
|
the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture
|
||||||
and beyond.
|
and beyond.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM
|
CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM
|
||||||
computations. The above figure shows CUTLASS performance relative to cuBLAS
|
computations. The above figure shows CUTLASS performance relative to cuBLAS
|
||||||
|
Loading…
Reference in New Issue
Block a user