▼Ncutlass | |
▶Ngemm | |
▶CClearAccumulators | |
CSharedStorage | The shared storage |
CDgemmConfig | |
CDgemmTraits | |
CFragmentMultiplyAdd | |
CFragmentMultiplyAdd< half > | |
▶CGemm | |
CParams | The params |
CGemmConfig | |
CGemmDesc | |
CGemmEpilogue | |
▶CGemmEpilogueTraits | |
CParams | The params |
CSharedStorage | The shared memory to swizzle the data in the epilogue |
CStreamSharedStorage | The shared memory storage to exchange data |
CGemmEpilogueTraitsHelper | |
▶CGemmGlobalIteratorAb | |
CParams | |
▶CGemmGlobalIteratorCd | |
CParams | The params |
▶CGemmGlobalTileCdTraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
▶CGemmGlobalTileTraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
CGemmMultiplicandTraits | |
CGemmOperandTraitsAb | Helper to describe attributes of GEMM matrix operands |
▶CGemmSharedLoadTileATraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
▶CGemmSharedLoadTileBTraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
▶CGemmSharedLoadTileDTraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
▶CGemmSharedStoreTileAbTraits | |
CThreadOffset | |
▶CGemmSharedStoreTileDTraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
▶CGemmSharedStoreWithSkewTileAbTraits | |
CThreadOffset | |
CGemmTileTraitsHelperA | |
CGemmTileTraitsHelperA< MatrixLayout::kColumnMajor, GemmConfig_ > | |
CGemmTileTraitsHelperA< MatrixLayout::kRowMajor, GemmConfig_ > | |
CGemmTileTraitsHelperB | |
CGemmTileTraitsHelperB< MatrixLayout::kColumnMajor, GemmConfig_ > | |
CGemmTileTraitsHelperB< MatrixLayout::kRowMajor, GemmConfig_ > | |
▶CGemmTraits | |
CGlobalLoadStream | Assemble the global load streams for A/B |
CMainLoopSharedStorage | |
CParams | The params |
CSharedLoadStream | Assemble the shared load stream for A/B |
CSharedStorage | The storage in shared memory |
CStreamSharedStorage | |
CGetExtent | |
CGetExtent< GemmOperand::kA, Tile_ > | |
CGetExtent< GemmOperand::kB, Tile_ > | |
CGlobalLoadStream | |
▶CGlobalLoadStreamBase | |
CParams | The params |
CSharedStorage | The storage in shared memory needed by that stream |
CHgemmConfig | |
▶CHgemmCrosswiseGlobalTileTraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
CHgemmSwizzle | |
CHgemmTileTraitsHelperA | |
CHgemmTileTraitsHelperA< MatrixLayout::kRowMajor, GemmConfig_ > | |
CHgemmTileTraitsHelperB | |
CHgemmTileTraitsHelperB< MatrixLayout::kColumnMajor, GemmConfig_ > | |
CHgemmTraits | |
CHgemmTraitsHelper | |
CHgemmTransformerA | |
CHgemmTransformerA< MatrixLayout::kColumnMajor, Iterator_ > | |
CHgemmTransformerA< MatrixLayout::kRowMajor, Iterator_ > | |
CHgemmTransformerB | |
CHgemmTransformerB< MatrixLayout::kColumnMajor, Iterator_ > | |
CHgemmTransformerB< MatrixLayout::kRowMajor, Iterator_ > | |
CIdentityBlockSwizzle | |
CIgemmConfig | |
CIgemmConfig< OutputTile_, int8_t, AccumulatorsPerThread_ > | |
▶CIgemmContiguousGlobalTileTraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
CIgemmEpilogue | |
CIgemmEpilogue< GemmEpilogueTraits_, true > | |
CIgemmEpilogueScalar | |
CIgemmEpilogueScalar< int > | |
CIgemmEpilogueTraits | |
CIgemmEpilogueTraitsHelper | |
CIgemmFloatToInt8Converter | |
CIgemmGlobalLoadTransformer | |
CIgemmGlobalLoadTransformer< Fragment< int8_t, kElements_ >, float > | |
CIgemmGlobalStoreTransformer | |
CIgemmGlobalStoreTransformer< float, Fragment< int8_t, kElements_ > > | |
CIgemmInt8ToFloatConverter | |
CIgemmSharedStoreTransformer | |
CIgemmSwizzle | |
CIgemmTileTraitsHelperA | |
CIgemmTileTraitsHelperA< MatrixLayout::kColumnMajor, GemmConfig_ > | |
CIgemmTileTraitsHelperB | |
CIgemmTileTraitsHelperB< MatrixLayout::kRowMajor, GemmConfig_ > | |
CIgemmTraits | |
CIgemmTraitsHelper | |
CIgemmTransformerA | |
CIgemmTransformerA< MatrixLayout::kColumnMajor, Iterator_ > | |
CIgemmTransformerA< MatrixLayout::kRowMajor, Iterator_ > | |
CIgemmTransformerB | |
CIgemmTransformerB< MatrixLayout::kColumnMajor, Iterator_ > | |
CIgemmTransformerB< MatrixLayout::kRowMajor, Iterator_ > | |
▶CLinearScaling | Functor to compute linear combination of fragments |
CParams | The parameters |
CProjectOperand | |
CProjectOperand< GemmOperand::kA, Kstrided > | Project A operand - (0, K, M) |
CProjectOperand< GemmOperand::kB, Kstrided > | Project B operand - (0, K, N) |
CProjectOperand< GemmOperand::kC, true > | Project C operand - (0, N, M) |
CProjectOperand< GemmOperand::kD, true > | Project D operand - (0, N, M) |
CReshapeThreads | |
CReshapeThreads< Tile_, Threads_, true > | |
CSgemmConfig | |
CSgemmTraits | |
▶CSharedLoadStream | |
CParams | The params |
CSimplifiedGemmEpilogueTraits | |
CSimplifiedGemmTraits | |
CSimplifiedGemmTraitsHelper | |
CThreadMultiplyAdd | Template performing matrix multiply-add operation within a thread |
CThreadMultiplyAdd< AccumulatorsPerThread_, ThreadsPerWarp_, half, half, half > | Template performing matrix multiply-add operation within a thread |
CThreadMultiplyAdd< AccumulatorsPerThread_, ThreadsPerWarp_, int8_t, int8_t, int > | Template performing matrix multiply-add operation within a thread |
▶CWmmaGemmGlobalIteratorCd | |
CParams | The params |
▶CWmmaGemmGlobalIteratorCdTraits | |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
▶Nplatform | |
Caligned_chunk | |
Caligned_storage | Std::aligned_storage |
▶Calignment_of | Std::alignment_of |
Cpad | |
Calignment_of< const value_t > | |
Calignment_of< const volatile value_t > | |
Calignment_of< double2 > | |
Calignment_of< double4 > | |
Calignment_of< float4 > | |
Calignment_of< int4 > | |
Calignment_of< long4 > | |
Calignment_of< longlong2 > | |
Calignment_of< longlong4 > | |
Calignment_of< uint4 > | |
Calignment_of< ulong4 > | |
Calignment_of< ulonglong2 > | |
Calignment_of< ulonglong4 > | |
Calignment_of< volatile value_t > | |
Cbool_constant | Std::bool_constant |
Cconditional | Std::conditional (true specialization) |
Cconditional< false, T, F > | Std::conditional (false specialization) |
Cdefault_delete | Default deleter |
Cdefault_delete< T[]> | Partial specialization for deleting array types |
Cenable_if | Std::enable_if (true specialization) |
Cenable_if< false, T > | Std::enable_if (false specialization) |
Cgreater | Std::greater |
Cintegral_constant | Std::integral_constant |
Cis_arithmetic | Std::is_arithmetic |
Cis_base_of | Std::is_base_of |
▶Cis_base_of_helper | Helper for std::is_base_of |
Cdummy | |
Cis_floating_point | Std::is_floating_point |
Cis_fundamental | Std::is_fundamental |
Cis_integral | Std::is_integral |
Cis_integral< char > | |
Cis_integral< const T > | |
Cis_integral< const volatile T > | |
Cis_integral< int > | |
Cis_integral< long > | |
Cis_integral< long long > | |
Cis_integral< short > | |
Cis_integral< signed char > | |
Cis_integral< unsigned char > | |
Cis_integral< unsigned int > | |
Cis_integral< unsigned long > | |
Cis_integral< unsigned long long > | |
Cis_integral< unsigned short > | |
Cis_integral< volatile T > | |
Cis_pointer | Std::is_pointer |
Cis_pointer_helper | Helper for std::is_pointer (false specialization) |
Cis_pointer_helper< T * > | Helper for std::is_pointer (true specialization) |
Cis_same | Std::is_same (false specialization) |
Cis_same< A, A > | Std::is_same (true specialization) |
Cis_trivially_copyable | |
Cis_void | Std::is_void |
Cis_volatile | Std::is_volatile |
Cis_volatile< volatile T > | |
Cless | Std::less |
Cnullptr_t | Std::nullptr_t |
Cplus | Platform::plus |
Cremove_const | Std::remove_const (non-const specialization) |
Cremove_const< const T > | Std::remove_const (const specialization) |
Cremove_cv | Std::remove_cv |
Cremove_volatile | Std::remove_volatile (non-volatile specialization) |
Cremove_volatile< volatile T > | Std::remove_volatile (volatile specialization) |
Cunique_ptr | Std::unique_ptr |
CAlignedStruct | |
CComputeOffsetFromShape | Compute the offset for the given coordinates in a cube |
CComputeOffsetFromShape< Shape< 1, kSh_, kSw_, 1 > > | Compute the offset for the given coordinates in a cube with one channel and a depth of 1 |
CComputeOffsetFromShape< Shape< 1, kSh_, kSw_, kSc_ > > | Compute the offset for the given coordinates in a cube with a depth of 1 |
CComputeOffsetFromStrides | Compute the offset for the given coordinates in a cube |
CComputeOffsetFromStrides< Shape< 1, S_h_, S_w_, 1 > > | Compute the offset for the given coordinates in a cube with one channel and a depth of 1 |
CComputeOffsetFromStrides< Shape< 1, S_h_, S_w_, S_c_ > > | Compute the offset for the given coordinates in a cube with a depth of 1 |
CComputeThreadOffsetFromStrides | Decompose threadId.x into coordinate of a cube whose dimensions are specified by Threads_. Afterwards compute the offset of those coordinates using Strides_ |
CComputeThreadOffsetFromStrides< Shape< 1, T_h_, T_w_, 1 >, Shape< 1, S_h_, S_w_, 1 > > | Specialization for D=1 and C=1 |
CComputeThreadOffsetFromStrides< Shape< 1, T_h_, T_w_, T_c_ >, Shape< 1, S_h_, S_w_, S_c_ > > | Specialization for D=1 |
CConstPredicateTileAdapter | Adapter to enable random access to predicates via logical coordinate within a tile |
CConvert | |
CConvert< Fragment< InputScalar_, kScalars_ >, Fragment< OutputScalar_, kScalars_ > > | |
CCoord | Statically-sized array specifying Coords within a tensor |
CCopy | |
Cdivide_assert | |
CExtent | Returns the extent of a scalar or vector |
CExtent< Vector< T, Lanes > > | Returns the number of lanes of a vector if need be |
CExtent< Vector< T, Lanes > const > | Returns the number of lanes of a vector if need be |
CFragment | A template defining Fragment Concept |
CFragmentConstIterator | |
CFragmentIterator | A template defining Fragment Iterator Concept |
CFragmentLoad | |
CFragmentLoad< IteratorFragment::kScalar, kAccessSize, Scalar_, Memory_, FragmentElement_, kStride > | |
CFragmentLoad< IteratorFragment::kWmmaMatrix, kAccessSize, Scalar_, Memory_, FragmentElement_, kStride > | |
CFragmentStore | |
CFragmentStore< IteratorFragment::kScalar, kAccessSize, Scalar_, Memory_, FragmentElement_, kStride > | |
CFragmentStore< IteratorFragment::kWmmaMatrix, kAccessSize, Scalar_, Memory_, FragmentElement_, kStride > | |
CGemmOperand | Gemm operand - D = A * B + C |
CIdentity | Describes identity elements |
Cis_pow2 | |
CIteratorAdvance | Specifies dimension in which post-increment accesses advance |
CIteratorFragment | Specifies whether iterator storage fragment consists of Scalar values or WMMA matrix |
CLoad | |
CLoad< double, 2, Memory_, true, 16 > | |
CLoad< Scalar_, Lanes_, Memory_, true, 16 > | |
CLoad< Scalar_, Lanes_, Memory_, true, 4 > | |
CLoad< Scalar_, Lanes_, Memory_, true, 8 > | |
Clog2_down | |
Clog2_down< N, 1, Count > | |
Clog2_up | |
Clog2_up< N, 1, Count > | |
CMatrixLayout | Describes layouts of matrices |
CMemorySpace | Enum to specify which memory space data resides in |
CPredicateTileAdapter | Adapter to enable random access to predicates via logical coordinate within a tile |
▶CPredicateVector | Statically sized array of bits implementing |
CConstIterator | A const iterator implementing Predicate Iterator Concept enabling sequential read-only access to prediactes |
CIterator | An iterator implementing Predicate Iterator Concept enabling sequential read and write access to predicates |
CTrivialIterator | Iterator that always returns true |
CReshapeTile | |
CReshapeTile< Tile_, kAccessSize_, true > | |
CShape | A Shape implementing Layout Concept describing the dimensions of a cube |
CShapeAdd | |
CShapeCount | Compute derived counted of a Layout Concept based class |
CShapeDiv | |
CShapeMax | |
CShapeMin | |
CShapeMul | |
CShapeScale | |
CShapeStrides | |
CShapeSub | |
Csqrt_est | |
CStorageType | |
CStorageType< 1 > | |
CStorageType< 2 > | |
CStorageType< 4 > | |
CStore | |
CStore< double, 2, Memory_, true, 16 > | |
CStore< Scalar_, Lanes_, Memory_, true, 16 > | |
CStore< Scalar_, Lanes_, Memory_, true, 4 > | |
CStore< Scalar_, Lanes_, Memory_, true, 8 > | |
CTensorRef | Structure modeling a pointer and stride into a tensor |
CTensorView | Host-side reference implementation of tensor operations |
CTiledThreadOffset | Basic thread offset function computed from a thread shape |
▶CTileIteratorBase | Iterator for accessing a stripmined tile in memory |
CParams | Parameters to the iterator |
▶CTileLoadIterator | An iterator implementing Tile Load Iterator Concept for loading a tile from memory |
CParams | Parameters |
▶CTileStoreIterator | An iterator implementing Tile Store Iterator Concept for storing a tile to memory |
CParams | Parameters |
CTileTraits | A template defining Tile Traits Concept |
CTileTraitsContiguousMajor | |
CTileTraitsStandard | Chooses 'best' shape to enable warp raking along contiguous dimension if possible |
CTileTraitsStrideMajor | |
▶CTileTraitsWarpRake | Tiling in which warps rake across the contiguous dimension |
CThreadOffset | Computes the thread offset in (H, W) based on thread ID |
CTrivialPredicateTileAdapter | Always returns true predicate |
CVector | |
CVector< half, kLanes_ > | |
CVectorize | |
CVectorize< Element_, 1 > | |
CVectorTraits | Traits describing properties of vectors and scalar-as-vectors |
CVectorTraits< Vector< T, Lanes > > | Partial specialization for actual cutlass::Vector |
CVectorTraits< Vector< T, Lanes > const > | Partial specialization for actual cutlass::Vector |