This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| gemm.h | ||
| gmem_tile.h | ||
| kernel_traits.h | ||
| mask.h | ||
| smem_tile.h | ||
| softmax.h | ||
| utils.h | ||
This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| gemm.h | ||
| gmem_tile.h | ||
| kernel_traits.h | ||
| mask.h | ||
| smem_tile.h | ||
| softmax.h | ||
| utils.h | ||