Commit Graph

18 Commits

Author SHA1 Message Date
Tri Dao
5aca153d6d Move bwd preprocess kernels to a separate file 2024-01-14 16:57:03 -08:00
Tri Dao
6777336a1c Move masking to a separate file (mask.h) 2024-01-14 12:43:47 -08:00
Tri Dao
9448264ddd Remove seqq_parallel backward kernel that's not used 2024-01-14 12:25:49 -08:00
Tri Dao
732654583c Implement deterministic backward (thanks to Meituan) 2023-12-23 17:57:36 -08:00
Tri Dao
5ab9b3667b Clean up alibi, implement non-causal alibi 2023-12-21 22:27:40 -08:00
Sanghun Cho
e4f726fc44
Support alibi, by Sanghun Cho from Kakao Brain
* hard-code alibi in fwd

* use params.h as hun_heads

* hard-code alibi in bwd

* add alibi on/off option

* compute alibi_start, ratio outside of kernels

* fix minor merge conflict

* add test_alibi.py

* change apply_alibi() location before masking

* add alibi in splitkv kernel

* fix backward func # of returns

* add out-of-bound check in apply_alibi()

* update test_alibi.py

* update test_alibi.py for kvcache

* simplify alibi parameter interface

* fix performance issue
by computing alibi outside of branch

* update test_flash_attn_varlen_func() for left padding

* implement alibi_slopes (b, nh) loading

* optimize apply_alibi() a bit

* update test cases for alibi_slopes loading

* reflect stylistic comments

* disable "seqlenq_ngroups_swapped" when using alibi

---------

Co-authored-by: monk.detective <monk.detective@kakaobrain.com>
2023-12-19 22:56:06 -08:00
Tri Dao
b4bf9cc1f3 Fix performance regression with causal 2023-11-26 19:07:25 -08:00
Driss Guessous
dc4b9ad6c4
add checks (#640) 2023-11-19 20:43:27 -08:00
Tri Dao
5a83425442 Change constexpr int to constexpr static int 2023-10-08 16:26:33 -07:00
Tri Dao
083e8f525f Implement local attention
Co-authored-by: Timothee Lacroix <t@mistral.ai>
2023-09-26 16:31:08 -07:00
Tri Dao
1879e089c7 Reduce number of templates for headdim > 128 2023-09-23 22:24:30 -07:00
Tri Dao
2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza) 2023-09-20 23:38:22 -07:00
Tri Dao
43617deab9 Remove template for (IsEvenMN=T, IsEvenK=F) to speed up compilation 2023-09-18 12:21:36 -07:00
Tri Dao
6a89b2f121 Remove constexpr in launch template to fix CI compilation 2023-09-03 22:59:41 -07:00
Sophia Wisdom
37e32febba
Remove commented out code in bwd (#512)
* Remove lots of comments

* Remove unused traits
2023-09-01 16:43:58 -07:00
Tri Dao
b1fbbd8337 Implement splitKV attention 2023-08-29 00:58:29 -07:00
Tri Dao
a4f148b6ab Fix masking of bwd when seqlen is not divisible by 128 2023-07-31 17:46:34 -07:00
Tri Dao
4f285b3547 FlashAttention-2 release 2023-07-17 06:21:34 -07:00