This is faster since we only need to do atomic adds on dq, instead of atomic adds on both dk and dv. |
||
|---|---|---|
| .. | ||
| utils | ||
| __init__.py | ||
| bert_padding.py | ||
| flash_attention.py | ||
| flash_attn_interface.py | ||
| flash_attn_triton_og.py | ||
| flash_attn_triton.py | ||
| flash_blocksparse_attention.py | ||
| flash_blocksparse_attn_interface.py | ||
| fused_softmax.py | ||
| rotary.py | ||