cutlass/include/cutlass
Ali Hassani d4be5ab5d7
Allow per-column bias in EpilogueTensorBroadcast (#1275)
* Allow per-column bias in EpilogueTensorBroadcast

EpilogueTensorBroadcast only supports per-row vector broadcast, because
the bias stride is hardcoded.

It can easily support both if the bias stride is made conditional, and
the original behavior is maintained by defaulting to per-row.

* Add unit test for EpilogueTensorBroadcast with per-col bias

---------

Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
Co-authored-by: Ali Hassani <ali@hippoml.com>
2024-01-04 12:48:31 -05:00
..
arch CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
conv CUTLASS 3.2.1 (#1113) 2023-09-26 17:24:26 -04:00
detail CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
epilogue Allow per-column bias in EpilogueTensorBroadcast (#1275) 2024-01-04 12:48:31 -05:00
gemm Add support for sparse GEMM with visitor epilogue (#1189) 2024-01-04 12:38:11 -05:00
layout CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
pipeline CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
platform CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00
reduction Fix typos 2 (#842) 2023-03-09 23:22:56 -05:00
thread CUTLASS 3.2 (#1024) 2023-08-07 20:50:32 -04:00
transform CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
aligned_buffer.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
array_planar_complex.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
array_subbyte.h CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
array.h CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00
barrier.h CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
bfloat16.h CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
blas3_types.h CUTLASS 3.2 (#1024) 2023-08-07 20:50:32 -04:00
blas3.h CUTLASS 3.2 (#1024) 2023-08-07 20:50:32 -04:00
block_striped.h CUTLASS 3.2 (#1024) 2023-08-07 20:50:32 -04:00
cluster_launch.hpp CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
complex.h CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
constants.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
coord.h CUTLASS 3.2.1 (#1113) 2023-09-26 17:24:26 -04:00
core_io.h CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00
cuda_host_adapter.hpp CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
cutlass.h CUTLASS 3.2.1 (#1113) 2023-09-26 17:24:26 -04:00
device_kernel.h CUTLASS 3.2 (#1024) 2023-08-07 20:50:32 -04:00
fast_math.h CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
float8.h Updates and Bug fixes to CUTLASS 3.3 (#1232) 2023-12-05 09:50:49 -05:00
floating_point_nvrtc.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
functional.h CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00
gemm_coord.h CUTLASS 3.2 (#1024) 2023-08-07 20:50:32 -04:00
gemm_coord.hpp Collection of changes to fix clang build. (#1200) 2023-12-08 14:42:12 -05:00
half.h Fix some sign conversion warnings (#1172) 2023-11-30 00:28:40 -05:00
integer_subbyte.h [fix] fix comparison operator for integer_subbyte (#1090) 2023-09-26 17:26:12 -04:00
kernel_hardware_info.h CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00
kernel_hardware_info.hpp CUTLASS 3.2.1 (#1113) 2023-09-26 17:24:26 -04:00
kernel_launch.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
matrix_coord.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
matrix_shape.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
matrix.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
numeric_conversion.h CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
numeric_size.h CUTLASS 3.2.1 (#1113) 2023-09-26 17:24:26 -04:00
numeric_types.h CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00
pitch_linear_coord.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
predicate_vector.h CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00
quaternion.h CUTLASS 3.2 (#1024) 2023-08-07 20:50:32 -04:00
real.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
relatively_equal.h CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00
semaphore.h Updates for 3.1 (#932) 2023-04-29 09:34:27 -04:00
subbyte_reference.h CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
tensor_coord.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
tensor_ref_planar_complex.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
tensor_ref.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
tensor_view_planar_complex.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
tensor_view.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
tfloat32.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
trace.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
uint128.h CUTLASS 3.4.0 (#1286) 2023-12-29 15:21:31 -05:00
wmma_array.h Fix several typos (#1169) 2023-11-02 23:54:46 -04:00
workspace.h CUTLASS 3.3.0 (#1167) 2023-11-02 11:09:05 -04:00