This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| flash_attn | ||
| fused_softmax | ||
| rotary | ||
This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| flash_attn | ||
| fused_softmax | ||
| rotary | ||