Update README.md

This commit is contained in:
Duane Merrill 2017-12-05 22:25:42 -05:00 committed by GitHub
parent 57747e382e
commit e2bf51c3fe
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -20,6 +20,9 @@ point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe
the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture
and beyond. and beyond.
For more exposition, see our Parallel Forall blog post ["CUTLASS: Fast Linear Algebra
in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-cuda).
# Project Structure # Project Structure
CUTLASS is arranged as a header-only library with several example test programs CUTLASS is arranged as a header-only library with several example test programs
@ -56,7 +59,7 @@ transposititions. Be sure to specify your target architecture.
<s|d|h|i|w>gemm_<nn|nt|tn|tt> <s|d|h|i|w>gemm_<nn|nt|tn|tt>
[--help] [--help]
[--schmoo || --m=<height> --n=<width> --k=<depth>] [--schmoo=<#schmoo-samples> || --m=<height> --n=<width> --k=<depth>]
[--i=<timing iterations>] [--i=<timing iterations>]
[--device=<device-id>] [--device=<device-id>]
[--alpha=<alpha> --beta=<beta>] [--alpha=<alpha> --beta=<beta>]