Update README.md
This commit is contained in:
parent
5bd3f09312
commit
537a4bcedf
@ -27,7 +27,8 @@ in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM
|
CUTLASS primitives are very efficient. When used to construct device-wide GEMM kernels,
|
||||||
|
they exhibit performance comparable to cuBLAS for scalar GEMM
|
||||||
computations. The above figure shows CUTLASS performance relative to cuBLAS
|
computations. The above figure shows CUTLASS performance relative to cuBLAS
|
||||||
for large matrix dimensions (M=10240, N=K=4096) running on an NVIDIA Tesla V100 GPU
|
for large matrix dimensions (M=10240, N=K=4096) running on an NVIDIA Tesla V100 GPU
|
||||||
when compiled with CUDA 9.0.
|
when compiled with CUDA 9.0.
|
||||||
|
Loading…
Reference in New Issue
Block a user