cutlass/tools/library/src/reference
Vijay Thakkar e01b9b5029
Shard gemm reference templates into multiple TUs for parallel compilation (#1043)
* Split apart gemm reference templates into multiple TUs for parallel compilation

* remove old files

* better balancing of ref kernels across TUs

* remove 3 new added refcheck kernels and some un-necessary fp8 library instances to reduce lib size

* remove auto fp8 kernels

* remove some redundant kernels
2023-08-30 16:46:30 -04:00
..
conv2d.cu New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
conv3d.cu New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
conv_reference_operation.h CUTLASS 3.1 (#915) 2023-04-14 23:19:34 -04:00
gemm_e4m3a_e4m3out.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_e4m3a_e5m2out.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_e5m2a_e4m3out.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_e5m2a_e5m2out.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_fp8in_bf16out.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_fp8in_fp16out.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_fp8in_fp32out.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_fp32out.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_fp_other.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_int4.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_int8_canonical.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_int8_interleaved_32.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_int8_interleaved_64.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00
gemm_reference_operation.h CUTLASS 3.1 (#915) 2023-04-14 23:19:34 -04:00
initialize_reference_operations.cu Shard gemm reference templates into multiple TUs for parallel compilation (#1043) 2023-08-30 16:46:30 -04:00