
* New updates. * Minor profiler updates Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
299 lines
10 KiB
Markdown
299 lines
10 KiB
Markdown

|
|
|
|
[README](/README.md#documentation) > **Layouts and Tensors**
|
|
|
|
# Layouts and Tensors
|
|
|
|
_Tensors_ are mathematical objects represented by a multidimensional array of numeric elements in memory.
|
|
These may define two dimensional matrices upon which classical linear algebra computations may be defined or
|
|
higher dimensional objects frequently used to structure data used by Deep Learning applications and frameworks.
|
|
|
|
This document describes design patterns used in CUTLASS to map logical index spaces onto memory (Layouts) and to
|
|
indirectly reference tensors in memory (TensorRef and TensorView objects).
|
|
|
|
As described, CUTLASS adheres to the following terminology which is consistent with the C++ Standard Library.
|
|
|
|
* *size* (scalar): number of elements in a tensor
|
|
* *capacity* (scalar): number of elements needed to represent tensor in memory (may be larger than _size_)
|
|
* *rank* (scalar): number of logical dimensions describing tensor
|
|
* *extent* (vector): size of each logical dimension in a tensor
|
|
|
|
## CUTLASS Layout Concept
|
|
|
|
CUTLASS Layouts are a systematic design pattern for the following:
|
|
* Mapping _logical_ index space to _physical_ offsets in memory
|
|
* Storing the dynamic state needed in the above computation
|
|
* Defining a type system for partial specialization of other CUTLASS components
|
|
|
|
_Concept:_ layouts satisfy the following concept.
|
|
```c++
|
|
/// CUTLASS Layout concept example
|
|
struct LayoutConcept {
|
|
|
|
/// Logical rank of tensor
|
|
static int const kRank;
|
|
|
|
/// Rank of stride vector
|
|
static int const kStrideRank;
|
|
|
|
/// Index type used for coordinates
|
|
struct Index;
|
|
|
|
/// Long index type used for offsets
|
|
struct LongIndex;
|
|
|
|
/// Logical coordinate - satisfies Coord<kRank, ..>
|
|
struct TensorCoord;
|
|
|
|
/// Stride object - satisfies Coord<kStrideRank, ..>
|
|
struct Stride
|
|
|
|
//
|
|
// Methods
|
|
//
|
|
|
|
/// Constructor
|
|
CUTLASS_HOST_DEVICE
|
|
LayoutConcept();
|
|
|
|
/// Ctor
|
|
CUTLASS_HOST_DEVICE
|
|
LayoutConcept(Stride stride);
|
|
|
|
/// Helper returns a layout to a tightly packed tensor
|
|
CUTLASS_HOST_DEVICE
|
|
static LayoutConcept packed(TensorCoord const &extent);
|
|
|
|
/// Function call operator returns the offset of a coordinate in linear memory.
|
|
/// Assumes coordinate has convention (row, column)
|
|
CUTLASS_HOST_DEVICE
|
|
LongIndex operator()(TensorCoord const &coord) const;
|
|
|
|
/// Inverse of layout function, mapping linear offset to logical coordinate
|
|
CUTLASS_HOST_DEVICE
|
|
TensorCoord inverse(LongIndex offset) const;
|
|
|
|
/// Returns the stride of the layout
|
|
CUTLASS_HOST_DEVICE
|
|
Stride stride() const;
|
|
|
|
/// Returns the stride of the layout
|
|
CUTLASS_HOST_DEVICE
|
|
Stride & stride();
|
|
|
|
/// Compute the number of contiguous elements needed to store a tensor with the given size
|
|
CUTLASS_HOST_DEVICE
|
|
LongIndex capacity(TensorCoord const &extent) const;
|
|
};
|
|
```
|
|
|
|
_Layout_ objects generalize leading dimensions of matrices typical in _BLAS_ implementations. For example, cuBLAS assumes
|
|
Fortran-style _column-major_ layouts of matrices and refers to this as the matrix's "leading dimension."
|
|
|
|
```c++
|
|
cublasGemmEx(
|
|
...
|
|
ptr_A, // pointer to first element of matrix A
|
|
lda, // leading dimension
|
|
...
|
|
);
|
|
```
|
|
This implies an element at coordinate (_row_, _column_) has offset `row + lda * column`.
|
|
|
|
This is equivalently represented by CUTLASS's `layout::ColumnMajor` type as follows.
|
|
```c++
|
|
|
|
layout::ColumnMajor layout(lda);
|
|
|
|
int offset = layout({row, column}); // returns row + lda * column
|
|
```
|
|
|
|
Other layout functions are possible such as row-major:
|
|
```c++
|
|
|
|
layout::RowMajor layout(lda);
|
|
|
|
int offset = layout({row, column}); // returns lda * row + column
|
|
```
|
|
|
|
In both cases, the _logical_ coordinate (_row_, _column_) is represented by the same object. This enables an algorithm to be
|
|
implemented as generic template, with locations within tensors always specified in logical space. _Layout_ objects map this to
|
|
physical offsets in memory.
|
|
|
|
The layout's `::packed()` static method may be used to construct a layout object given the extent of a densely packed tensor.
|
|
This method is needed when an algorithm must define a buffer of arbitrary layout.
|
|
|
|
Example:
|
|
```c++
|
|
|
|
typename ArbitraryLayout::TensorCoord extent = make_Coord(...);
|
|
typename ArbitraryLayout::TensorCoord coord;
|
|
|
|
ArbitraryLayout layout = ArbitraryLayout::packed(extent);
|
|
|
|
int offset = layout({coord});
|
|
```
|
|
|
|
The layout's `::capacity()` method computes the number of locations in memory needed to represent a tensor. This is
|
|
useful when allocating memory, as more storage may be needed than what is strictly necessary for a fully packed
|
|
tensor.
|
|
|
|
Example:
|
|
```c++
|
|
|
|
int lda = columns + padding;
|
|
MatrixCoord extent{rows, columns};
|
|
|
|
layout::RowMajor layout(lda);
|
|
|
|
auto capacity = layout.capacity(extent); // returns rows * (columns + padding)
|
|
```
|
|
|
|
## Accessing elements within a tensor
|
|
|
|
### TensorRef
|
|
|
|
`TensorRef<class T, class Layout>` is a structure containing both a pointer to the start of a
|
|
tensor and a layout object to access its elements. This is a convenient object which may be
|
|
passed to functions to limit an explosion of arguments when the number of stride elements is
|
|
numerous.
|
|
|
|
Example:
|
|
```c++
|
|
int4_t *ptr = ...;
|
|
int ldm = ...;
|
|
|
|
int row = ...;
|
|
int column = ...;
|
|
|
|
layout::ColumnMajor layout(ldm);
|
|
TensorRef<int4_t, layout::ColumnMajor> ref(ptr, layout);
|
|
|
|
int4_t x = ref.at({row, column}); // loads a 4-bit signed integer from the tensor
|
|
|
|
ref.at({row, column}) = x * 2_s4; // transforms this quantity and stores it back
|
|
```
|
|
|
|
### TensorView
|
|
|
|
Matrices and tensors used in linear algebra computations are invariably finite. `TensorView<class T, class Layout>` extends `TensorRef<>` by
|
|
adding an `extent` vector to describe the logical extent of the tensor or matrix.
|
|
|
|
Example:
|
|
```c++
|
|
int4_t *ptr = ...;
|
|
int ldm = ...;
|
|
MatrixCoord extent = ...;
|
|
|
|
int row = ...;
|
|
int column = ...;
|
|
|
|
layout::ColumnMajor layout(ldm);
|
|
TensorView<int4_t, layout::ColumnMajor> view(ptr, layout, extent);
|
|
|
|
MatrixCoord coord = {row, column};
|
|
|
|
if (view.contains(coord)) { // verify coordinate is in bounds before performing access
|
|
|
|
int4_t x = ref.at(coord);
|
|
ref.at({row, column}) = x * 2_s4;
|
|
}
|
|
|
|
```
|
|
|
|
A `TensorView<>` may be constructed from a `TensorRef<>` succinctly as follows:
|
|
```c++
|
|
layout::ColumnMajor layout(ldm);
|
|
TensorRef<int4_t, layout::ColumnMajor> ref(ptr, layout);
|
|
|
|
TensorView<int4_t, layout::ColumnMajor> view(ref, extent); // construct TensorView from TensorRef and extent
|
|
```
|
|
|
|
Note, computations avoid becoming overdetermined by accepting a single problem size component
|
|
and `TensorRef` objects for each of the operands whose extents are implied as a precondition of the operation. By avoiding
|
|
redundant storage of extent quantities, CUTLASS minimizes capacity utilization of precious resources such as constant memory.
|
|
This is consistent with BLAS conventions.
|
|
|
|
# Summary:
|
|
|
|
The design patterns described in this document form a hierarchy:
|
|
* `T *ptr;` is a pointer to a contiguous sequence of elements of type `T`
|
|
* `Layout layout;` is an object mapping an index space to a linear offset
|
|
* `TensorRef<T, Layout> ref(ptr, layout);` is an object pointing to an _unbounded_ tensor containing elements of type `T` and a layout of type `Layout`
|
|
* `TensorView<T, Layout> view(ref, extent);` is an object pointing to a _bounded_ tensor containing elements of type `T` and a layout of type `Layout`
|
|
|
|
# Appendix: Existing Layouts
|
|
|
|
This section enumerates several existing Layout types defined in CUTLASS.
|
|
|
|
Matrix layouts:
|
|
- `PitchLinear`: data layout defined by _contiguous_ and _strided_ dimensions. _contiguous_ refers to consecutive elements in memory, where as _strided_ refers to data separated by a uniform stride
|
|
-- Rank: 2
|
|
-- TensorCoord type: `PitchLinearCoord`
|
|
-- Shape type: `PitchLinearShape`
|
|
-- Stride rank: 1
|
|
|
|
- `ColumnMajor`: data layout defined by _rows_ and _columns_ dimensions. Can be mapped to `PitchLinear` by: (_contiguous_ = _rows_, _strided_ = _columns_)
|
|
-- Rank: 2
|
|
-- TensorCoord type: `MatrixCoord`
|
|
-- Shape type: `MatrixShape`
|
|
-- Stride rank: 1
|
|
|
|
- `RowMajor`: data layout defined by _rows_ and _columns_ dimensions. Can be mapped to `PitchLinear` by: (_contiguous_ = _columns_, _strided_ = _rows_)
|
|
-- Rank: 2
|
|
-- TensorCoord type: `MatrixCoord`
|
|
-- Shape type: `MatrixShape`
|
|
-- Stride rank: 1
|
|
|
|
- `ColumnMajorInterleaved<k>`: data layout defined by _rows_ and _columns_ dimensions. Data is packed into a 'column-major' arrangement of row vectors of fixed length.
|
|
-- Rank: 2
|
|
-- TensorCoord type: `MatrixCoord`
|
|
-- Shape type: `MatrixShape`
|
|
-- Stride rank: 1
|
|
|
|
- `RowMajorInterleaved<k>`: data layout defined by _rows_ and _columns_ dimensions. Data is packed into a 'row-major' arrangement of column vectors of fixed length.
|
|
-- Rank: 2
|
|
-- TensorCoord type: `MatrixCoord`
|
|
-- Shape type: `MatrixShape`
|
|
-- Stride rank: 1
|
|
|
|
Tensor layouts:
|
|
- `TensorNHWC`:
|
|
|
|
Permuted Shared Memory Layouts:
|
|
- `TensorOpCongruous<ElementSize>`
|
|
- `TensorOpCrosswise<ElementSize>`
|
|
|
|
|
|
# Copyright
|
|
|
|
Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
|
SPDX-License-Identifier: BSD-3-Clause
|
|
|
|
```
|
|
Redistribution and use in source and binary forms, with or without
|
|
modification, are permitted provided that the following conditions are met:
|
|
|
|
1. Redistributions of source code must retain the above copyright notice, this
|
|
list of conditions and the following disclaimer.
|
|
|
|
2. Redistributions in binary form must reproduce the above copyright notice,
|
|
this list of conditions and the following disclaimer in the documentation
|
|
and/or other materials provided with the distribution.
|
|
|
|
3. Neither the name of the copyright holder nor the names of its
|
|
contributors may be used to endorse or promote products derived from
|
|
this software without specific prior written permission.
|
|
|
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
|
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
|
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
|
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
|
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
|
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
```
|