* Actually use float accumulation in gemm_f16t_f16t_f16t_wmma_tensor_op_f32_sm70.cu As title * Update gemm_f16t_f16t_f16t_wmma_tensor_op_f32_sm70.cu change the missing one Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| device | ||
| kernel | ||
| thread | ||
| threadblock | ||
| warp | ||
| CMakeLists.txt | ||