flash-attention

History

Sanghun Cho e4f726fc44 Support alibi, by Sanghun Cho from Kakao Brain * hard-code alibi in fwd * use params.h as hun_heads * hard-code alibi in bwd * add alibi on/off option * compute alibi_start, ratio outside of kernels * fix minor merge conflict * add test_alibi.py * change apply_alibi() location before masking * add alibi in splitkv kernel * fix backward func # of returns * add out-of-bound check in apply_alibi() * update test_alibi.py * update test_alibi.py for kvcache * simplify alibi parameter interface * fix performance issue by computing alibi outside of branch * update test_flash_attn_varlen_func() for left padding * implement alibi_slopes (b, nh) loading * optimize apply_alibi() a bit * update test cases for alibi_slopes loading * reflect stylistic comments * disable "seqlenq_ngroups_swapped" when using alibi --------- Co-authored-by: monk.detective <monk.detective@kakaobrain.com>		2023-12-19 22:56:06 -08:00
..
layers	Run isort and black on test files	2023-08-18 20:59:35 -07:00
losses	[CrossEntropy] Test longer sequences	2023-12-16 19:11:23 -08:00
models	[Llama] Fix some tests, add tests for Llama 2 and CodeLlama	2023-09-20 23:36:46 -07:00
modules	Run isort and black on test files	2023-08-18 20:59:35 -07:00
ops	[LayerNorm] Implement dropout in fused residual + LN/RMSNorm	2023-12-19 16:26:07 -08:00
pyproject.toml	Move pyproject.toml to flash-attn and tests dir to avoid PEP 517	2023-08-25 15:05:28 -07:00
test_alibi.py	Support alibi, by Sanghun Cho from Kakao Brain	2023-12-19 22:56:06 -08:00
test_flash_attn.py	[Gen] Accept cache_batch_idx to index into the KV cache	2023-10-03 16:27:26 -07:00
test_rotary.py	[Rotary] Implement varlen rotary	2023-09-03 17:57:10 -07:00