47 typename AccumulatorsPerThread_,
49 int kScalarsPerLdgA_ = 1,
51 int kScalarsPerLdgB_ = 1>
65 ThreadMultiplyAdd<AccumulatorsPerThread_, Shape<1, 4, 8>, float, float, float>,
101 int kScalarsPerLdgA_ = 1,
103 int kScalarsPerLdgB_ = 1,
105 typename Index_ = int,
107 typename GemmConfig_ =
110 typename GemmEpilogueTraits_ =
120 GemmEpilogue<GemmEpilogueTraits_>,
Defines iterators for efficiently loading and storing to global memory.
Defines structural properties of complete GEMM computation.
Definition: sgemm_traits.h:52
Template implementing matrix multiply-add operations on fragments.
Implements the epilogue phase of the GEMM kernel that efficiently updates global memory with the comp...
Defines iterators for efficiently loading and storing tiles to and from shared memory.
Definition: gemm_traits.h:79
A Shape implementing Layout Concept describing the dimensions of a cube.
Definition: shape.h:64
Definition: gemm_epilogue_traits.h:300
Kind
Definition: matrix_traits.h:36
Definition: sgemm_traits.h:112
Functor to compute linear combination of fragments.
Definition: linear_scaling.h:40
Implements a software-pipelined efficient GEMM.
Defines structural properties of the GEMM epilogue.
Definition: gemm_traits.h:723