From 6f091f5620a49d865a1ded4396ee4ff9d2913bac Mon Sep 17 00:00:00 2001 From: Duane Merrill Date: Tue, 5 Dec 2017 22:44:01 -0500 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b1d4e2d8..6787f648 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture and beyond. -![ALT](/media/fig-09-complete-hierarchy.png "Relative performance of CUTLASS and cuBLAS for large matrices") +![ALT](/media/cutlass-performance-plot.png "Relative performance of CUTLASS and cuBLAS for large matrices") CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM computations. The above figure shows CUTLASS performance relative to cuBLAS