diff --git a/media/docs/doxygen_mainpage.md b/media/docs/doxygen_mainpage.md index 53620d99..26508548 100644 --- a/media/docs/doxygen_mainpage.md +++ b/media/docs/doxygen_mainpage.md @@ -33,7 +33,7 @@ CUTLASS 2.0 is a substantial refactoring from the previous version, intended to # Example CUTLASS GEMM The following illustrates an example function that defines a CUTLASS GEMM kernel -with single-precision inputs and outputs. This is an exercpt from the CUTLASS SDK +with single-precision inputs and outputs. This is an excerpt from the CUTLASS SDK [basic_gemm example](https://github.com/NVIDIA/cutlass/tree/master/examples/00_basic_gemm/basic_gemm.cu). ~~~~~~~~~~~~~~~~~~~~~{.cpp} diff --git a/media/docs/quickstart.md b/media/docs/quickstart.md index be274c7e..4d8bffc5 100644 --- a/media/docs/quickstart.md +++ b/media/docs/quickstart.md @@ -347,7 +347,7 @@ Note, the above could be simplified as follows using helper methods defined in ` # CUTLASS Library -The [CUTLASS Library](./tools/library) defines an API for managing and executing collections of compiled +The [CUTLASS Library](../../tools/library) defines an API for managing and executing collections of compiled kernel instances and launching them from host code without template instantiations in client code. The host-side launch API is designed to be analogous to BLAS implementations for convenience, though its diff --git a/media/docs/terminology.md b/media/docs/terminology.md index 96d51639..36836d85 100644 --- a/media/docs/terminology.md +++ b/media/docs/terminology.md @@ -41,7 +41,7 @@ contiguous and strided dimensions of a tile. **Rank**: number of dimensions in a multidimensional index space, array, tensor, or matrix. Consistent with [C++ Standard Library](https://en.cppreference.com/w/cpp/types/rank) -**Register**: in device code, registes are the most efficient storage for statically sized arrays of elements. +**Register**: in device code, registers are the most efficient storage for statically sized arrays of elements. Arrays may be expected to be stored in registers if all accesses are made via constexpr indices or within fully unrolled loops.