Updating readme with relative per chart
This commit is contained in:
parent
e2bf51c3fe
commit
0428c89fd5
@ -5,7 +5,7 @@
|
|||||||
CUTLASS is a collection of CUDA C++ template abstractions for implementing
|
CUTLASS is a collection of CUDA C++ template abstractions for implementing
|
||||||
high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA.
|
high-performance matrix-multiplication (GEMM) at all levels and scales within CUDA.
|
||||||
It incorporates strategies for hierarchical decomposition and data movement similar
|
It incorporates strategies for hierarchical decomposition and data movement similar
|
||||||
to those used to implement cuBLAS. CUTLASS decomposes these “moving parts” into
|
to those used to implement cuBLAS. CUTLASS decomposes these "moving parts" into
|
||||||
reusable, modular software components abstracted by C++ template classes. These
|
reusable, modular software components abstracted by C++ template classes. These
|
||||||
thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized
|
thread-wide, warp-wide, block-wide, and device-wide primitives can be specialized
|
||||||
and tuned via custom tiling sizes, data types, and other algorithmic policy. The
|
and tuned via custom tiling sizes, data types, and other algorithmic policy. The
|
||||||
@ -20,6 +20,13 @@ point (FP64) types. Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe
|
|||||||
the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture
|
the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture
|
||||||
and beyond.
|
and beyond.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM
|
||||||
|
computations. The above figure shows CUTLASS performance relative to cuBLAS
|
||||||
|
compiled with CUDA 9.0 running on an NVIDIA Tesla V100 GPU for large matrix
|
||||||
|
dimensions (M=10240, N=K=4096).
|
||||||
|
|
||||||
For more exposition, see our Parallel Forall blog post ["CUTLASS: Fast Linear Algebra
|
For more exposition, see our Parallel Forall blog post ["CUTLASS: Fast Linear Algebra
|
||||||
in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-cuda).
|
in CUDA C++"](https://devblogs.nvidia.com/parallelforall/cutlass-linear-algebra-cuda).
|
||||||
|
|
||||||
|
BIN
media/cutlass-performance-plot.png
Normal file
BIN
media/cutlass-performance-plot.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 39 KiB |
Loading…
Reference in New Issue
Block a user