From 57551902d00cbb2134d4a4ae72cc2241ae538524 Mon Sep 17 00:00:00 2001 From: Haicheng Wu <57973641+hwu36@users.noreply.github.com> Date: Wed, 11 May 2022 00:01:19 -0400 Subject: [PATCH] Update functionality.md add some explanations to the functionality table. --- media/docs/functionality.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/media/docs/functionality.md b/media/docs/functionality.md index b85b3a39..caa1139b 100644 --- a/media/docs/functionality.md +++ b/media/docs/functionality.md @@ -9,6 +9,23 @@ The following table summarizes device-level GEMM kernels in CUTLASS, organized by opcode class, data type, and layout. Hyperlinks to relevant unit tests demonstrate how specific template instances may be defined. +- N - Column Major Matrix +- T - Row Major matrix +- {N,T} x {N,T} - All combinations, i.e. NN, NT, TN, TT +- [NHWC](/include/cutlass/layout/tensor.h#L63-206) - 4 dimension tensor used for convolution +- [NCxHWx](/include/cutlass/layout/tensor.h#L290-395) - Interleaved 4 dimension tensor used for convolution +- f - float point +- s - signed int +- b - bit +- cf - complex float +- bf16 - bfloat16 +- tf32 - tfloat32 +- Simt - Use Simt CUDA Core MMA +- TensorOp - Use Tensor Core MMA +- SpTensorOp - Use Sparse Tensor Core MMA +- WmmaTensorOp - Use WMMA abstraction to use Tensor Core MMA + + |**Opcode Class** | **Compute Capability** | **CUDA Toolkit** | **Data Type** | **Layouts** | **Unit Test** | |-----------------|------------------------|------------------|--------------------------------|------------------------|------------------| | **Simt** | 50,60,61,70,75 | 9.2+ | `f32 * f32 + f32 => f32` | {N,T} x {N,T} => {N,T} | [example](/test/unit/gemm/device/simt_sgemm_nt_sm50.cu) |