cutlass/python/cutlass_library
Ali Hassani eee0cab26c
Stamp out 1x1x1 clusters, 128x256 CTA shape (#1665)
Adds 128x256 tile shapes to FP16/BF16 and FP8 generators.
Also adds 1x1x1 clusters to all existing FP16/BF16/FP8 generators.

NOTE: it is important to set kernel filter (--kernels /
CUTLASS_LIBRARY_KERNELS) to a non empty string and skip pruning to get
all of the new configurations.

If profiling exhaustively, they can be set to `*`.

Number of CUTLASS 3.X GEMMs before this commit: 2868
Number of CUTLASS 3.X GEMMs after this commit: 4016

Co-authored-by: Ali Hassani <ahassani@nvidia.com>
2024-07-31 20:22:29 -04:00
..
__init__.py Update license year (#1306) 2024-01-16 14:37:22 -05:00
conv2d_operation.py CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
conv3d_operation.py CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
conv3x_emitter.py CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
gemm_operation.py CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
generator.py Stamp out 1x1x1 clusters, 128x256 CTA shape (#1665) 2024-07-31 20:22:29 -04:00
library.py Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
manifest.py CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
rank_2k_operation.py Update license year (#1306) 2024-01-16 14:37:22 -05:00
rank_k_operation.py Update license year (#1306) 2024-01-16 14:37:22 -05:00
symm_operation.py Update license year (#1306) 2024-01-16 14:37:22 -05:00
trmm_operation.py Update license year (#1306) 2024-01-16 14:37:22 -05:00