* Support parallel split K mode for porfiling
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
* Parallel Split K support
1. find gemm kernel by preference key
2. switch m n for redution kernel
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
* parallel splitk for fp16 gemm
* add one missing file
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>