cutlass/examples
Lain 8aa95dbb88
Fix the racing condition of mixed-input gemm when writing the registers (#1931)
* move two warpgroup_wait

* merge main

---------

Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
2024-11-08 13:15:54 -05:00
..
00_basic_gemm Update license year (#1306) 2024-01-16 14:37:22 -05:00
01_cutlass_utilities Update license year (#1306) 2024-01-16 14:37:22 -05:00
02_dump_reg_shmem Updates for CUTLASS 3.4.1 (#1346) 2024-02-15 15:48:34 -05:00
03_visualize_layout CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
04_tile_iterator CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
05_batched_gemm CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
06_splitK_gemm Update license year (#1306) 2024-01-16 14:37:22 -05:00
07_volta_tensorop_gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
08_turing_tensorop_gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
09_turing_tensorop_conv2dfprop Update license year (#1306) 2024-01-16 14:37:22 -05:00
10_planar_complex Update license year (#1306) 2024-01-16 14:37:22 -05:00
11_planar_complex_array Update license year (#1306) 2024-01-16 14:37:22 -05:00
12_gemm_bias_relu CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
13_two_tensor_op_fusion CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
14_ampere_tf32_tensorop_gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
15_ampere_sparse_tensorop_gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
16_ampere_tensorop_conv2dfprop Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
17_fprop_per_channel_bias Update license year (#1306) 2024-01-16 14:37:22 -05:00
18_ampere_fp64_tensorop_affine2_gemm Update license year (#1306) 2024-01-16 14:37:22 -05:00
19_tensorop_canonical Update license year (#1306) 2024-01-16 14:37:22 -05:00
20_simt_canonical Update license year (#1306) 2024-01-16 14:37:22 -05:00
21_quaternion_gemm Update license year (#1306) 2024-01-16 14:37:22 -05:00
22_quaternion_conv Update license year (#1306) 2024-01-16 14:37:22 -05:00
23_ampere_gemm_operand_reduction_fusion CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
24_gemm_grouped CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
25_ampere_fprop_mainloop_fusion Update license year (#1306) 2024-01-16 14:37:22 -05:00
26_ampere_wgrad_mainloop_fusion Update license year (#1306) 2024-01-16 14:37:22 -05:00
27_ampere_3xtf32_fast_accurate_tensorop_gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
28_ampere_3xtf32_fast_accurate_tensorop_fprop Update license year (#1306) 2024-01-16 14:37:22 -05:00
29_ampere_3xtf32_fast_accurate_tensorop_complex_gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
30_wgrad_split_k Update license year (#1306) 2024-01-16 14:37:22 -05:00
31_basic_syrk CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
32_basic_trmm Update license year (#1306) 2024-01-16 14:37:22 -05:00
33_ampere_3xtf32_tensorop_symm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
34_transposed_conv2d Update license year (#1306) 2024-01-16 14:37:22 -05:00
35_gemm_softmax CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
36_gather_scatter_fusion CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
37_gemm_layernorm_gemm_fusion CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
38_syr2k_grouped CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
39_gemm_permute Update license year (#1306) 2024-01-16 14:37:22 -05:00
40_cutlass_py CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
41_fused_multi_head_attention CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
42_ampere_tensorop_group_conv Update license year (#1306) 2024-01-16 14:37:22 -05:00
43_ell_block_sparse_gemm CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
44_multi_gemm_ir_and_codegen Update license year (#1306) 2024-01-16 14:37:22 -05:00
45_dual_gemm Update license year (#1306) 2024-01-16 14:37:22 -05:00
46_depthwise_simt_conv2dfprop Update license year (#1306) 2024-01-16 14:37:22 -05:00
47_ampere_gemm_universal_streamk Update license year (#1306) 2024-01-16 14:37:22 -05:00
48_hopper_warp_specialized_gemm CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
49_hopper_gemm_with_collective_builder CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
50_hopper_gemm_with_epilogue_swizzle CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
51_hopper_gett CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
52_hopper_gather_scatter_fusion CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
53_hopper_gemm_permute CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
54_hopper_fp8_warp_specialized_gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
55_hopper_mixed_dtype_gemm Fix the racing condition of mixed-input gemm when writing the registers (#1931) 2024-11-08 13:15:54 -05:00
56_hopper_ptr_array_batched_gemm CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
57_hopper_grouped_gemm CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
58_ada_fp8_gemm CUTLASS 3.5.1 (#1623) 2024-07-29 08:46:24 -04:00
59_ampere_gather_scatter_conv CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
60_cutlass_import Update license year (#1306) 2024-01-16 14:37:22 -05:00
61_hopper_gemm_with_topk_and_softmax CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
62_hopper_sparse_gemm CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
63_hopper_gemm_with_weight_prefetch CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
common CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
cute CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00
python Updates for 3.4 release. (#1305) 2024-01-16 13:42:51 -05:00
CMakeLists.txt CUTLASS 3.6.0 (#1850) 2024-10-09 15:33:27 -04:00