cutlass/examples
Andrew Kerr 1ab1027954
Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100)
- Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>.
- Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out
- Added test_examples target to build and test all CUTLASS examples
- Minor edits to documentation to point to GTC 2020 webinar
2020-06-15 10:47:01 -07:00
..
00_basic_gemm CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
01_cutlass_utilities CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
02_dump_reg_shmem CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
03_visualize_layout Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100) 2020-06-15 10:47:01 -07:00
04_tile_iterator CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
05_batched_gemm CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
06_splitK_gemm Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100) 2020-06-15 10:47:01 -07:00
07_volta_tensorop_gemm Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100) 2020-06-15 10:47:01 -07:00
08_turing_tensorop_gemm Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100) 2020-06-15 10:47:01 -07:00
10_planar_complex CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
11_planar_complex_array CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
12_gemm_bias_relu CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
13_fused_two_gemms CUTLASS 2.2 (#96) 2020-06-08 16:17:35 -07:00
common CUTLASS 2.0 (#62) 2019-11-19 16:55:34 -08:00
CMakeLists.txt Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100) 2020-06-15 10:47:01 -07:00