so that layouts can change, without users' code needing to change. For example, a row-major MxN layout and a column-major MxN layout can be treated identically in software.
CuTe makes great use of dynamic (known only at run-time) and static (known at compile-time) integers.
* Dynamic integers (or "run-time integers") are just ordinary integral types like `int` or `size_t` or `uint16_t`. Anything that is accepted by `std::is_integral<T>` is considered a dynamic integer in CuTe.
* Static integers (or "compile-time integers") are instantiations of types like `std::integral_constant<Value>`. These types encode the value as a `static constexpr` member. They also support casting to their underlying dynamic types, so they can be used in expressions with dynamic integers. CuTe defines its own CUDA-compatibe static integer types `cute::C<Value>` along with overloaded math operators so that math on static integers results in static integers. CuTe defines shortcut aliases `Int<1>`, `Int<2>`, `Int<3>` and `_1`, `_2`, `_3` as conveniences, which you should see often within examples.
CuTe attempts to handle static and dynamic integers identically. In the examples that follow, all dynamic integers could be replaced with static integers and vice versa. When we say "integer" in CuTe, we almost always mean a static OR dynamic integer.
CuTe provides a number of traits to work with integers.
*`cute::is_integral<T>`: Checks whether `T` is a static or dynamic integer type.
*`cute::is_std_integral<T>`: Checks whether `T` is a dynamic integer type. Equivalent to `std::is_integral<T>`.
*`cute::is_static<T>`: Checks whether `T` is an empty type (so instantiations cannot depend on any dynamic information). Equivalent to `std::is_empty`.
*`cute::is_constant<N,T>`: Checks that `T` is a static integer AND its value is equivalent to `N`.
See the [`integral_constant` implementations](../../../include/cute/numeric/integral_constant.hpp) for more information.
### Tuple
A tuple is a finite ordered list of zero or more elements.
The [`cute::tuple` class](../../../include/cute/container/tuple.hpp) behaves like `std::tuple`, but works on device and host. It imposes restrictions on its template arguments and strips down the implementation for performance and simplicity.
*`depth(IntTuple)`: The number of hierarchical `IntTuple`s. A single integer has depth 0, a tuple of integers has depth 1, a tuple that contains a tuple of integers has depth 2, etc.
We write `IntTuple`s with parentheses to denote the hierarchy. For example, `6`, `(2)`, `(4,3)`, and `(3,(6,2),8)` are all `IntTuple`s.
### Shapes and Strides
Both `Shape` and `Stride` are `IntTuple` concepts.
### Layout
A `Layout` is a tuple of (`Shape`, `Stride`).
Semantically, it implements a mapping from
any coordinate within the Shape to an index via the Stride.
### Tensor
A `Layout` can be composed with data -- e.g., a pointer or an array -- to create a `Tensor`. The index generated by the `Layout` is used to subscript an iterator to retrieve the appropriate data. For details on `Tensor`, please refer to the
[`Tensor` section of the tutorial](./03_tensor.md).
A `Layout` is a pair of `IntTuple`s: the `Shape` and the `Stride`. The first element defines the abstract *shape* of the `Layout`, and the second element defines the *strides*, which map from coordinates within the shape to the index space.
*`depth(Layout)`: The depth of the `Layout`'s shape. A single integer has depth 0, a tuple of integers has depth 1, a tuple of tuples of integers has depth 2, etc.
inside of nested `IntTuple` or `Layout` more easily.
For example, we permit `get<I...>(x)`, where `I...` is a "C++ parameter pack" that denotes zero or more (integer) template arguments. These hierarchical access functions include the following.
*`get<I0,I1,...,IN>(x) := get<IN>(...(get<I1>(get<I0>(x)))...)`. Extract the `IN`th of the ... of the `I1`st of the `I0`th element of `x`.
It deduces the types of the function's arguments and returns a `Layout` with the appropriate template arguments.
Similarly, the `make_shape` and `make_stride` functions
return a `Shape` resp. `Stride`.
CuTe often uses these `make_*` functions
due to restrictions around constructor template argument deduction (CTAD) and to avoid having to repeat static or dynamic integer types.
When the `Stride` argument is omitted, it is generated from the provided `Shape` with `LayoutLeft` as default. The `LayoutLeft` tag constructs strides as an exclusive prefix product of the `Shape` from left to right, without regard to the `Shape`'s hierarchy. This can be considered a "generalized column-major stride generation". The `LayoutRight` tag constructs strides as an exclusive prefix product of the `Shape` from right to left, without regard to the `Shape`'s hierarchy. For shapes of depth one, this can be considered a "row-major stride generation", but for hierarchical shapes the resulting strides may be surprising. For example, the strides of `s2xh4` above could be generated with `LayoutRight`.
Calling `print` on each layout above results in the following
```
s8 : _8:_1
d8 : 8:_1
s2xs4 : (_2,_4):(_1,_2)
s2xd4 : (_2,4):(_1,_2)
s2xd4_a : (_2,4):(_12,_1)
s2xd4_col : (_2,4):(_1,_2)
s2xd4_row : (_2,4):(4,_1)
s2xh4 : (2,(2,2)):(4,(2,1))
s2xh4_col : (2,(2,2)):(_1,(2,4))
```
The `Shape:Stride` notation is used quite often for `Layout`. The `_N` notation is shorthand for a static integer while other integers are dynamic integers. Observe that both `Shape` and `Stride` may be composed of both static and dynamic integers.
Also note that the `Shape` and `Stride` are assumed to be *congruent*. That is, `Shape` and `Stride` have the same tuple profiles. For every integer in `Shape`, there is a corresponding integer in `Stride`. This can be asserted with
```cpp
static_assert(congruent(my_shape, my_stride));
```
### Using a Layout
The fundamental use of a `Layout` is to map between coordinate space(s) defined by the `Shape` and an index space defined by the `Stride`. For example, to print an arbitrary rank-2 layout in a 2-D table, we can write the function
```c++
template <classShape,classStride>
void print2D(Layout<Shape,Stride> const& layout)
{
for (int m = 0; m <size<0>(layout); ++m) {
for (int n = 0; n <size<1>(layout); ++n) {
printf("%3d ", layout(m,n));
}
printf("\n");
}
}
```
which produces the following output for the above examples.
We can see static, dynamic, row-major, column-major, and hierarchical layouts printed here. The statement `layout(m,n)` provides the mapping of
the logical 2-D coordinate (m,n) to the 1-D index.
Interestingly, the `s2xh4` example isn't row-major or column-major. Furthermore, it has three modes but is still interpreted as rank-2 and we're using a 2-D coordinate. Specifically, `s2xh4` has a 2-D multi-mode in the second mode, but we're still able to use a 1-D coordinate for that mode. More on this in the next section, but first we can generalize this another step. Let's use a 1-D coordinate and treat all of the modes of each layout as a single multi-mode. For instance, the following `print1D` function
```c++
template <classShape,classStride>
void print1D(Layout<Shape,Stride> const& layout)
{
for (int i = 0; i <size(layout);++i){
printf("%3d ", layout(i));
}
}
```
produces the following output for the above examples.
By the above rank-1 definition, we *also* interpret layout `((4,2)):((2,1))` as a vector, since its shape is rank-1. The inner shape looks like a 4x2 row-major matrix, but the extra pair of parenthesis suggest we can interpret those two modes as a 1-D 8-element vector. The strides tell us that the first `4` elements are strided by `2` and then there are `2` of those first elements strided by `1`.
is also logically 4x2, with stride-2 across the rows but a multi-stride down the columns. The first `2` elements down the column have a stride of `4` and then there is a copy of those with stride-1. Since this layout is logically 4x2,
With the notion of compatibility above, we emphasize that every `Layout` accepts multiple kinds of coordinates. Every `Layout` accepts coordinates for any `Shape` that is compatible with it. CuTe provides mappings between these sets of coordinates via a colexicographical order.
Thus, all Layouts provide two fundamental mappings:
The map from an input coordinate to a natural coordinate is the application of a colexicographical order (reading right to left, instead of "lexicographical," which reads left to right) within the `Shape`.
Take the shape `(3,(2,3))`, for example. This shape has three coordinate sets: the 1-D coordinates, the 2-D coordinates, and the natural (h-D) coordinates.
Each coordinate into the shape `(3,(2,3))` has two *equivalent* coordinates and all equivalent coordinates map to the same natural coordinate. To emphasize again, because all of the above coordinates are valid inputs, a Layout with Shape `(3,(2,3))` can be used as if it is a 1-D array of 18 elements by using the 1-D coordinates, a 2-D matrix of 3x6 elements by using the 2-D coordinates, or a h-D tensor of 3x(2x3) elements by using the h-D (natural) coordinates.
The previous 1-D print demonstrates how CuTe identifies 1-D coordinates with a colexicographical ordering of 2-D coordinates. Iterating from `i = 0` to `size(layout)` and indexing into our layout with the single integer coordinate `i`, traverses the 2-D coordinates in this "generalized-column-major" order, even if the layout maps coordinates to indices in a row-major or more complex fashion.
The function `cute::idx2crd(idx, shape)` is responsible for the coordinate mapping. It will take any coordinate within the shape and compute the equivalent natural coordinate for that shape.
The map from a natural coordinate to an index is performed by taking the inner product of the natural coordinate with the `Layout`'s `Stride`.
Take the layout `(3,(2,3)):(3,(12,1))`, for example. Then a natural coordinate `(i,(j,k))` will result in the index `i*3 + j*12 + k*1`. The indices this layout computes are shown in the 2-D table below where `i` is used as the row coordinate and `(j,k)` is used as the column coordinate.
The function `cute::crd2idx(c, shape, stride)` is responsible for the index mapping. It will take any coordinate within the shape, compute the equivalent natural coordinate for that shape (if it is not already), and compute the inner product with the strides.
* Every `Layout` has an h-D (natural) coordinate space where h is "hierarchical." These are ordered colexicographically and the enumeration of that order corresponds to the 1-D coordinates above. A natural coordinate is *congruent* to the `Shape` so that each element of the coordinate has a corresponding element of the `Shape`.
* The inner product of the elements of the natural coordinate with the elements of the `Stride` produces the resulting index.
For each `Layout` there exists an integral `Shape` that is that compatible with that `Layout`. Namely, that integral shape is `size(layout)`. We can then observe that
> Layouts are functions from integers to integers.
If you're familiar with the C++23 feature `mdspan`,
this is an important difference between
`mdspan` layout mappings and CuTe `Layout`s. In CuTe, `Layout` is a first class citizen, is natively hierarchical to naturally represent functions beyond row-major and column-major, and can similarly be indexed with a hierarchy of coordinates.
(`mdspan` layout mappings can represent hierarchical functions as well,
but this requires defining a custom layout.)
Input coordinates for an `mdspan` must have the same shape as the `mdspan`;
a multidimensional `mdspan` does not accept 1-D coordinates.