From e2bf51c3fe1e4575c3c4937d5faa709e2c93d959 Mon Sep 17 00:00:00 2001 From: Duane Merrill Date: Tue, 5 Dec 2017 22:25:42 -0500 Subject: [PATCH] Update README.md --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 87fd9a6e..ad859f04 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,9 @@ point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture and beyond. +For more exposition, see our Parallel Forall blog post ["CUTLASS: Fast Linear Algebra +in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-cuda). + # Project Structure CUTLASS is arranged as a header-only library with several example test programs @@ -56,7 +59,7 @@ transposititions. Be sure to specify your target architecture. gemm_ [--help] - [--schmoo || --m= --n= --k=] + [--schmoo=<#schmoo-samples> || --m= --n= --k=] [--i=] [--device=] [--alpha= --beta=]