diff --git a/media/docs/cute/02_layout_algebra.md b/media/docs/cute/02_layout_algebra.md index 3b70252b..0b5e76c5 100644 --- a/media/docs/cute/02_layout_algebra.md +++ b/media/docs/cute/02_layout_algebra.md @@ -390,12 +390,12 @@ The elements NOT pointed to by `B` sounds like a complement, `B*`, up to the siz ### Logical Divide 1-D Example -Consider tiling the 1-D layout `A = (2,4,3):(4,1,8)` with the tiler `B = 4:2`. Informally, this means that we have a 1-D vector of 24 elements in some storage order defined by `A` and we want to extract tiles of 4 elements strided by 2. +Consider tiling the 1-D layout `A = (4,2,3):(2,1,8)` with the tiler `B = 4:2`. Informally, this means that we have a 1-D vector of 24 elements in some storage order defined by `A` and we want to extract tiles of 4 elements strided by 2. This is computed in the three steps described in the implementation above. * Complement of `B = 4:2` under `size(A) = 24` is `B* = (2,3):(1,8)`. * Concantenation of `(B,B*) = (4,(2,3)):(2,(1,8))`. -* Composition of `A = (2,4,3):(4,1,8)` with `(B,B*)` is then `((2,2),(2,3)):((4,1),(2,8))`. +* Composition of `A = (4,2,3):(2,1,8)` with `(B,B*)` is then `((2,2),(2,3)):((4,1),(2,8))`.
@@ -280,8 +284,8 @@ To implement generic partitioning of a `Tensor`, we apply composition or tiling
Let's take a tiled example and look at how we can slice it in useful ways.
```cpp
-Tensor A = make_tensor(ptr, make_shape(24,8)); // (8,24)
-auto tiler = Shape<_8,_4>{}; // (_4,_8)
+Tensor A = make_tensor(ptr, make_shape(8,24)); // (8,24)
+auto tiler = Shape<_4,_8>{}; // (_4,_8)
Tensor tiled_a = zipped_divide(A, tiler); // ((_4,_8),(2,3))
```
@@ -313,7 +317,7 @@ Another common partitioning strategy is called a thread-value partitioning. In t
// to 1D coordinates within a 4x8 tensor
// (T8,V4) -> (M4,N8)
auto tv_layout = Layout
@@ -415,7 +415,7 @@ Similar to the 2-D composition example above, consider a 2-D layout `A = (9,(4,8
The above figure depicts `A` as a 2-D layout with the elements pointed to by `B` highlighted in gray. The layout `B` describes our "tile" of data, and there are twelve of those tiles in `A` shown by each of the colors. After the divide, the first mode of each mode of the result is the tile of data and the second mode of each mode iterates over each tile. In that sense, this operation can be viewed as a kind of `gather` operation or as simply a permutation on the rows and cols.
-Note that the first mode of each mode of the result is the sublayout `(3,(2,4)):(236,(13,52))` and is precisely the result we would have received if we had applied `composition` instead of `logical_divide`.
+Note that the first mode of each mode of the result is the sublayout `(3,(2,4)):(177,(13,2))` and is precisely the result we would have received if we had applied `composition` instead of `logical_divide`.
### Zipped, Tiled, Flat Divides
diff --git a/media/docs/cute/03_tensor.md b/media/docs/cute/03_tensor.md
index c44f282a..35c2e6f2 100644
--- a/media/docs/cute/03_tensor.md
+++ b/media/docs/cute/03_tensor.md
@@ -157,8 +157,8 @@ Tensor rmem_4x8_col = make_tensor