This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| cutlass@319a389f42 | ||
| src | ||
| fmha_api.cpp | ||
This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| cutlass@319a389f42 | ||
| src | ||
| fmha_api.cpp | ||