cutlass/tools/profiler/src
Fujun Han 1e4703cbab
Support parallel split K mode for porfiling (#277)
* Support parallel split K mode for porfiling

Signed-off-by: Peter Han <fujun.han@iluvatar.ai>

* Parallel Split K support

  1. find gemm kernel by preference key
  2. switch m n for redution kernel

Signed-off-by: Peter Han <fujun.han@iluvatar.ai>

* parallel splitk for fp16 gemm

* add one missing file

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-01-27 10:37:37 -05:00
..
conv2d_operation_profiler.cu Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
conv2d_operation_profiler.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
conv3d_operation_profiler.cu Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
conv3d_operation_profiler.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
cublas_helpers.cu Updates to fused epilogue (#383) 2021-12-17 16:04:43 -05:00
cublas_helpers.h CUTLASS 2.8 (#363) 2021-11-19 13:26:35 -08:00
cudnn_helpers.cpp Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
cudnn_helpers.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
cutlass_profiler.cu Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
cutlass_profiler.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
debug.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
device_allocation.cu Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
device_allocation.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
device_context.cu Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
device_context.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
enumerated_types.cpp Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
enumerated_types.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
gemm_operation_profiler.cu Support parallel split K mode for porfiling (#277) 2022-01-27 10:37:37 -05:00
gemm_operation_profiler.h Support parallel split K mode for porfiling (#277) 2022-01-27 10:37:37 -05:00
gpu_timer.cpp Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
gpu_timer.h CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning 2021-09-03 10:26:15 -07:00
main.cpp Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
operation_profiler.cu CUTLASS 2.8 (#363) 2021-11-19 13:26:35 -08:00
operation_profiler.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
options.cu CUTLASS 2.8 (#363) 2021-11-19 13:26:35 -08:00
options.h CUTLASS 2.8 (#363) 2021-11-19 13:26:35 -08:00
performance_report.cpp CUTLASS 2.8 (#363) 2021-11-19 13:26:35 -08:00
performance_report.h CUTLASS 2.8 (#363) 2021-11-19 13:26:35 -08:00
performance_result.cu Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
performance_result.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
problem_space.cpp Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
problem_space.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
reduction_operation_profiler.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
sparse_gemm_operation_profiler.cu Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00
sparse_gemm_operation_profiler.h Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00