This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| test_flash_attn.py | ||
| test_rotary.py | ||
This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| test_flash_attn.py | ||
| test_rotary.py | ||