| Cutlass
    CUDA Templates for Linear Algebra Subroutines and Solvers | 
Implements tile iterators to partition the thread block tile into 2D subtiles and efficiently load each. Applies permute transformation to construct 'interleaved K-strided' data layout in which 4-element dot products from the same K index are arranged in consecutive locations within shared memory. More...
#include "cutlass/coord.h"#include "cutlass/gemm/gemm_global_tile.h"#include "cutlass/matrix_traits.h"Go to the source code of this file.
| Classes | |
| struct | cutlass::gemm::IgemmGlobalTileTraits< kOperand_, kLayout_, Scalar_, Tile_, Threads_, kAccessSize_ > | 
| struct | cutlass::gemm::IgemmGlobalTileTraits< kOperand_, kLayout_, Scalar_, Tile_, Threads_, kAccessSize_ >::ThreadOffset | 
| Computes the thread offset in (H, W) based on thread ID.  More... | |
| struct | cutlass::gemm::IgemmGlobalIteratorAb< TileTraits_, Index_ > | 
| Namespaces | |
| cutlass | |
| cutlass::gemm | |
Supports efficient loads from shared memory to target the DP4A instruction.
 1.8.14
 1.8.14