* Skip void-C kernels in the profiler when beta is non zero
CUTLASS profiler will only skip disposition for void-C kernels when beta
is non zero, when it makes more sense to skip running it in the first
place.
Not all users are aware of void-C kernels (as far as I know it wasn't a
thing in 2.X), and not everyone remembers to filter out voidC kernels
when running the profiler with a non zero beta.
The easiest solution (and as far as I can tell correct way of handling this)
is that `can_implement` return `false` when beta is non zero (or
whatever argument indicates an epilogue source) but we have a void-C
kernel.
Profiler already includes functionality to skip running kernels that
fail `can_implement`.
* Move checks to collectives instead
---------
Co-authored-by: Ali Hassani <ahassani@nvidia.com>
* It seems that __cplusplus can be inconsistent with _MSVC_LANG when discerning C++17 version. See https://github.com/NVIDIA/cutlass/issues/1474. Added switch to check _MSVC_LANG in addition to __cplusplus
* Fixed typo.
* Oops, another typo.
* Changed incorrect logic, ifndef to ifdef
* Define CUTLAS_CPLUSPLUS for language version testing
Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>
---------
Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>
* add missing header for size_t in `numeric_types.h`
* make nvrtc happy
* add missing header for int types in `cutlass/arch/memory.h`
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
* fix uint128 operator add for 64-bit hilo implemenation
* add uint128 test for operator add
* make clang happy
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
* Allow per-column bias in EpilogueTensorBroadcast
EpilogueTensorBroadcast only supports per-row vector broadcast, because
the bias stride is hardcoded.
It can easily support both if the bias stride is made conditional, and
the original behavior is maintained by defaulting to per-row.
* Add unit test for EpilogueTensorBroadcast with per-col bias
---------
Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
Co-authored-by: Ali Hassani <ali@hippoml.com>
* Fix inline ptx escaping for predicates.
Prevents `error: invalid % escape in inline assembly string` when compiling with clang.
* More double-quoting.