* hard-code alibi in fwd * use params.h as hun_heads * hard-code alibi in bwd * add alibi on/off option * compute alibi_start, ratio outside of kernels * fix minor merge conflict * add test_alibi.py * change apply_alibi() location before masking * add alibi in splitkv kernel * fix backward func # of returns * add out-of-bound check in apply_alibi() * update test_alibi.py * update test_alibi.py for kvcache * simplify alibi parameter interface * fix performance issue by computing alibi outside of branch * update test_flash_attn_varlen_func() for left padding * implement alibi_slopes (b, nh) loading * optimize apply_alibi() a bit * update test cases for alibi_slopes loading * reflect stylistic comments * disable "seqlenq_ngroups_swapped" when using alibi --------- Co-authored-by: monk.detective <monk.detective@kakaobrain.com> |
||
|---|---|---|
| .. | ||
| layers | ||
| losses | ||
| models | ||
| modules | ||
| ops | ||
| utils | ||
| __init__.py | ||
| bert_padding.py | ||
| flash_attn_interface.py | ||
| flash_attn_triton_og.py | ||
| flash_attn_triton.py | ||
| flash_blocksparse_attention.py | ||
| flash_blocksparse_attn_interface.py | ||
| fused_softmax.py | ||
| pyproject.toml | ||