cutlass/include/cutlass
Ali Hassani 1f2b590da6
Skip void-C kernels in the profiler when beta is non zero (#1661)
* Skip void-C kernels in the profiler when beta is non zero

CUTLASS profiler will only skip disposition for void-C kernels when beta
is non zero, when it makes more sense to skip running it in the first
place.

Not all users are aware of void-C kernels (as far as I know it wasn't a
thing in 2.X), and not everyone remembers to filter out voidC kernels
when running the profiler with a non zero beta.

The easiest solution (and as far as I can tell correct way of handling this)
is that `can_implement` return `false` when beta is non zero (or
whatever argument indicates an epilogue source) but we have a void-C
kernel.

Profiler already includes functionality to skip running kernels that
fail `can_implement`.

* Move checks to collectives instead

---------

Co-authored-by: Ali Hassani <ahassani@nvidia.com>
2024-07-31 18:11:58 -04:00
..
arch CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
conv CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
detail CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
epilogue Skip void-C kernels in the profiler when beta is non zero (#1661) 2024-07-31 18:11:58 -04:00
gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
layout CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
pipeline CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
platform CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
reduction CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
thread Update license year (#1306) 2024-01-16 14:37:22 -05:00
transform CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
aligned_buffer.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
array_planar_complex.h Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
array_subbyte.h Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
array.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
barrier.h CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
bfloat16.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
blas3_types.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
blas3.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
block_striped.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
cluster_launch.hpp CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
complex.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
constants.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
coord.h Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
core_io.h Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
cuda_host_adapter.hpp CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
cutlass.h Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
device_kernel.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
fast_math.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
float8.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
floating_point_nvrtc.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
functional.h fix build on SM 5.2 (#1664) 2024-07-31 09:54:57 -04:00
gemm_coord.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
gemm_coord.hpp Update license year (#1306) 2024-01-16 14:37:22 -05:00
half.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
integer_subbyte.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
kernel_hardware_info.h Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
kernel_hardware_info.hpp Update license year (#1306) 2024-01-16 14:37:22 -05:00
kernel_launch.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
matrix_coord.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
matrix_shape.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
matrix.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
numeric_conversion.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
numeric_size.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
numeric_types.h Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
pitch_linear_coord.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
predicate_vector.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
quaternion.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
real.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
relatively_equal.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
semaphore.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
subbyte_reference.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
tensor_coord.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
tensor_ref_planar_complex.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
tensor_ref.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
tensor_view_planar_complex.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
tensor_view.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
tfloat32.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
trace.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
uint128.h Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
version.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
wmma_array.h Update license year (#1306) 2024-01-16 14:37:22 -05:00
workspace.h CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00