cutlass

History

Haicheng Wu 012c62c748 bug fixes and enharcement to gemm reductionK fusion (#682 ) * add two missing files * fix bunch of bugs of gemm-reducek fusion and add a device interface * small changes Co-authored-by: Haicheng Wu <haichengw@nvidia.com>		2022-11-03 11:07:50 -04:00
..
00_basic_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
01_cutlass_utilities	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
02_dump_reg_shmem	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
03_visualize_layout	Fix for #596 (typo in example 03) (#597 )	2022-08-09 09:58:36 -04:00
04_tile_iterator	Remove redundant <fstream> includes (#563 )	2022-07-19 15:23:54 -04:00
05_batched_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
06_splitK_gemm	fix: fix types in example 06 (#587 )	2022-07-29 12:46:06 -04:00
07_volta_tensorop_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
08_turing_tensorop_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
09_turing_tensorop_conv2dfprop	Remove redundant <fstream> includes (#563 )	2022-07-19 15:23:54 -04:00
10_planar_complex	Remove redundant <fstream> includes (#563 )	2022-07-19 15:23:54 -04:00
11_planar_complex_array	Remove redundant <fstream> includes (#563 )	2022-07-19 15:23:54 -04:00
12_gemm_bias_relu	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
13_two_tensor_op_fusion	Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. (#590 )	2022-08-15 11:19:24 -04:00
14_ampere_tf32_tensorop_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
15_ampere_sparse_tensorop_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
16_ampere_tensorop_conv2dfprop	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
17_fprop_per_channel_bias	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
18_ampere_fp64_tensorop_affine2_gemm	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
19_tensorop_canonical	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
20_simt_canonical	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
21_quaternion_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
22_quaternion_conv	Remove redundant <fstream> includes (#563 )	2022-07-19 15:23:54 -04:00
23_ampere_gemm_operand_reduction_fusion	bug fixes and enharcement to gemm reductionK fusion (#682 )	2022-11-03 11:07:50 -04:00
24_gemm_grouped	CUTLASS 2.10 updates (#622 )	2022-09-12 21:26:30 -04:00
25_ampere_fprop_mainloop_fusion	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
26_ampere_wgrad_mainloop_fusion	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
27_ampere_3xtf32_fast_accurate_tensorop_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
28_ampere_3xtf32_fast_accurate_tensorop_fprop	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
29_ampere_3xtf32_fast_accurate_tensorop_complex_gemm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
30_wgrad_split_k	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
31_basic_syrk	[examples] Fix typos in SYRK and TRMM examples (#507 )	2022-06-03 22:52:41 -04:00
32_basic_trmm	[examples] Fix typos in SYRK and TRMM examples (#507 )	2022-06-03 22:52:41 -04:00
33_ampere_3xtf32_tensorop_symm	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
34_transposed_conv2d	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
35_gemm_softmax	upstream internal updates (#616 )	2022-09-04 23:05:09 -04:00
36_gather_scatter_fusion	CUTLASS 2.10 updates (#622 )	2022-09-12 21:26:30 -04:00
37_gemm_layernorm_gemm_fusion	CUTLASS 2.10 (#615 )	2022-09-03 18:48:46 -04:00
38_syr2k_grouped	CUTLASS 2.10 updates (#622 )	2022-09-12 21:26:30 -04:00
39_gemm_permute	fix(permute.h): incorrect comment in `Tensor5DPermute20314` (#637 )	2022-09-22 09:21:13 -04:00
40_cutlass_py	Bump CUTLASS Python container version (#672 )	2022-10-22 21:09:39 -04:00
41_multi_head_attention	Remove excessive includes from examples/41_multi_head_attention (#669 )	2022-10-21 22:23:15 -04:00
42_fused_multi_head_attention	ex42: Fused MHA imported from xFormers (#662 )	2022-10-17 10:49:33 -04:00
43_dual_gemm	Example 43 - DualGemm (#670 )	2022-10-26 14:04:42 -04:00
common	CUTLASS 2.0 (#62 )	2019-11-19 16:55:34 -08:00
CMakeLists.txt	Example 43 - DualGemm (#670 )	2022-10-26 14:04:42 -04:00