#### Avoid calling functions "fast" or "optimized"
Putting words like "fast" or "optimized"
in the name of a function
assumes that the "fast" path is actually faster.
That might be true now, but later changes
(in the code, compilers, or GPU hardware)
might make it false. In that case,
your name could be unintentionally misleading.
Consider instead a name that briefly describes
the algorithm or feature that is relevant for optimization.
For example, `compute_on_host` is more meaningful
than `compute_slowly`, and computing on host
might be faster in some cases
(e.g., if the data are already on host
and the algorithm is not GPU-friendly).
CUTLASS code has not always followed this rule in the past.
Some functions and classes might have words like "fast" in their name.
New code should follow this rule, however.
#### Avoid creating unconstrained templated functions with common names
See [C++ Core Guidelines T.47](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#t47-avoid-highly-visible-unconstrained-templates-with-common-names):
"Avoid highly visible unconstrained templates
with common names."
Argument-dependent lookup (ADL) means that
if users call a function name without specifying the namespace,
the compiler can find overloads
of that function in any namespace.
This can lead to ambiguous overloads in users' code,
just because they happened to include one of your header files
that exposes an unconstrained function template.
The following illustrates this
with an unconstrained swap overload in the `cutlass` namespace.
```c++
#include <cassert>
#include <memory>
#include <utility>
// Uncomment the line below to observe unwarranted build errors.
//#define BAD_CUTLASS_SWAP 1
namespace cutlass {
struct Bar {
float f;
};
} // namespace cutlass
#ifdef BAD_CUTLASS_SWAP
namespace cutlass {
template<classT>
void swap(T& a, T& b) // don't do this
{
T tmp = a;
a = b;
b = tmp;
}
} // namespace cutlass
#endif // BAD_CUTLASS_SWAP
namespace other {
#ifdef BAD_CUTLASS_SWAP
using cutlass::swap;
#endif // BAD_CUTLASS_SWAP
// Imagine for the sake of this example
// that "foo" is a less common name,
// and that T is constrained via
// std::enable_if or a requires clause.
template<classT>
void foo(T& a, T& b)
{
// The usual idiom for using std::swap is the "swap two-step":
//
// 1. import std::swap into the current scope, then
// 2. call swap without namespace qualification.
//
// That won't build if we have another swap
// overload available in the scope already.
using std::swap;
swap(a, b); // OBSERVE UNWARRANTED BUILD ERROR HERE
}
} // namespace other
int main()
{
int x = 42;
int y = 43;
other::foo(x, y);
assert(x == 43);
assert(y == 42);
cutlass::Bar a{42.0};
cutlass::Bar b{43.0};
other::foo(a, b);
assert(a.f == 43.0);
assert(b.f == 42.0);
// GCC 7.5 std::unique_ptr::reset calls swap,
// leading to the same issue as above.
// GCC 12.2's implementation of std::unique_ptr
// does not have this issue. Nevertheless,
// breaking the swap two-step will break users' code,
// just by them happening to include your headers.
auto ptr = std::make_unique<cutlass::Bar>(cutlass::Bar{666.0f});
ptr.reset(new cutlass::Bar{777.0f}); // OBSERVE UNWARRANTED BUILD ERROR HERE
return 0;
}
```
#### Function return values and in-out parameters
##### Prefer return values to output parameters
In general, avoid in-out mutable references to return a value.
If you need to return multiple values,
you can return them by `struct` or `tuple`,
rather than by output references.
This includes the special case of error reporting
by returning either a value or an error code.
Please see the next section for details.
```c++
// Instead of passing in-out mutable references ...
void not_preferred(float& input_and_output); // not preferred
// keep functions pure and return value types instead
float preferred(float input); // preferred
```
##### Return multiple values by struct or tuple
Sometimes a function needs to return multiple values. In that case, consider the following, in decreasing order of preference.
1. Return a `struct`. This lets you name the fields
(for more self-documenting code),
yet still permits use of structured binding.
2. Return a `tuple`. If you need a tuple type
that works on device, use `cute::tuple`.
(Please note that `cute::tuple` does not work
for all the types that work in `std::tuple`.
CuTe's documentation explains.)
Here is an example of the struct approach for named values.
For a comparable example in the C++ Standard,
please see [`std::allocate_at_least`](https://en.cppreference.com/w/cpp/memory/allocate_at_least),
auto [val, rel_err, ok] = my_computation(tolerance);
// Approach 2: Keep the struct and use its named fields.
// This approach prevents errors like mixing the order of return types.
// However, it only works for structs, not for tuples.
auto result = my_computation(tolerance);
if (not result.success) {
// computation did not succeed
}
else if (result.relative_error > tolerance) {
// successful but relative error too large
}
else {
// successful and relative error is in bounds
}
}
```
##### Reporting errors from a function that returns one or more values
We may want to return one or more values
from a function that could fail
or otherwise report errors.
That is, the function either
* returns one or more valid values, or
* does not return any values and reports an error,
but NOT BOTH. We contrast this with cases
when it's meaningful to report both a result
and whether the result is satisfactory.
For example, when solving
a system of nonlinear equations iteratively,
users may want the approximate computed solution,
even if the iteration did not succeed
by converging to the desired tolerance
in the desired number of steps.
(Users may want to invest more steps,
or use the current approximation
to jump-start a different algorithm.)
We're talking here about the "either valid value(s),
or error, but not both" case.
For this case, C++ offers a few options.
1. Return the value(s), or throw an exception on error
2.`std::expected` (requiring C++23) or something like it
3.`std::optional` (for a Boolean error state)
or something like it
4.`std::variant` (a C++17 fall-back for `std::expected`)
or something like it
5. C-style interface: return an error code,
and "return" the values as output parameters
We usually cannot or do not want to
throw exceptions on device.
Some code projects forbid exceptions entirely
(on host or device)
and tell the compiler to disable them.
If we exclude a C-style interface (the last option)
as not idiomatic C++, then for host-only code,
`std::expected`, `std::optional`, and `std::variant`
all work.
For code that needs to build and run on device,
we can fall back to libcu++ equivalents
in the `cuda::std::` namespace, when they exist.
Otherwise, we must resort to returning a struct or tuple
with the value and the error information,
and ask users not to use the value on error.
This is acceptable if the value can be constructed
cheaply with a reasonable default.
##### Performance of different value-or-error reporting methods
[P1886R0](https://wg21.link/P1886R0)
(Ben Craig, "Error speed benchmarking")
surveys different ways in Standard C++
to report errors from a function
that returns one or more values,
and compares their (host-only) performance
with different compilers.
##### Use aggregate initialization when returning a struct or tuple
Use aggregate initialization when returning a struct or tuple.
This avoids duplication of the return type name.
```c++
struct foo_result {
float value = 0.0f;
float error = 0.0f;
bool success = false;
};
foo_result foo(std::span<constfloat> input)
{
// ... code ...
// Prefer this. We know what type the function returns.
return {val, err, ok}; // prefer this
// Naming foo_result again here is unnecessary.
// return foo_result{val, err, ok};
}
```
However, note that this won't work if the function returns `auto`.
The general rule is to avoid code duplication.
```c++
auto foo(std::span<constfloat> input)
{
// ... code ...
if constexpr (some_condition) {
return foo_result{val, err, ok};
}
else {
return bar_result{val, err, ok};
}
}
```
##### Prefer using the actual return type to auto, if you know the type
C++ lets you use `auto` to deduce the type returned from a function.
* If you know the actual type, prefer using the type instead of `auto`.
* Use [Constructor Type Argument Deduction](https://en.cppreference.com/w/cpp/language/class_template_argument_deduction)
(CTAD) if you know that a function returns some type
(e.g., `Tensor`), but don't know the type's template arguments.
* Use `auto` in structured bindings (where you have to use it anyway). This also makes your code agnostic of whether the return type is a `struct`, `tuple`, `pair`, or other tuple-like type.
* Be careful using `auto` with types that provide expression templates.
Contrast this with "Almost Always Auto" (AAA) style.
We deliberately choose not to follow AAA style,
for the following reasons.
* Using the actual type when we know it can help prevent common loss-of-precision errors in mixed-precision computations, an important use case for CUTLASS.
* CTAD gives us much of the brevity of AAA, with more clarity.
* Using the actual type instead of `auto` can prevent common dangling errors with expression templates.
#### Classes and structs
Type names use `CamelCase`.
That is, words start with capital letters.
The remaining letters in the word are lower case,
and words are joined with no intervening underscores.
The only exception is when implementations are
a drop-in replacement for C++ Standard Library components.