From 6cb88d53eb8d5cfd18189f57cbe252c287a1b5d5 Mon Sep 17 00:00:00 2001 From: Duane Merrill Date: Tue, 5 Dec 2017 22:58:12 -0500 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 11fe5a1c..d346ca84 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra- # Performance -![ALT](/media/cutlass-performance-plot.png "Relative performance of CUTLASS and cuBLAS for large matrices") +

CUTLASS primitives are very efficient. When used to construct device-wide GEMM kernels, they exhibit performance comparable to cuBLAS for scalar GEMM