Update README.md
This commit is contained in:
parent
537a4bcedf
commit
6cb88d53eb
@ -25,7 +25,7 @@ in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-
|
||||
|
||||
# Performance
|
||||
|
||||

|
||||
<p align="center"><img src=/media/cutlass-performance-plot.png></p>
|
||||
|
||||
CUTLASS primitives are very efficient. When used to construct device-wide GEMM kernels,
|
||||
they exhibit performance comparable to cuBLAS for scalar GEMM
|
||||
|
Loading…
Reference in New Issue
Block a user