Tri Dao
|
5aca153d6d
|
Move bwd preprocess kernels to a separate file
|
2024-01-14 16:57:03 -08:00 |
|
Tri Dao
|
6777336a1c
|
Move masking to a separate file (mask.h)
|
2024-01-14 12:43:47 -08:00 |
|
Tri Dao
|
9448264ddd
|
Remove seqq_parallel backward kernel that's not used
|
2024-01-14 12:25:49 -08:00 |
|
Tri Dao
|
732654583c
|
Implement deterministic backward (thanks to Meituan)
|
2023-12-23 17:57:36 -08:00 |
|
Tri Dao
|
5ab9b3667b
|
Clean up alibi, implement non-causal alibi
|
2023-12-21 22:27:40 -08:00 |
|
Sanghun Cho
|
e4f726fc44
|
Support alibi, by Sanghun Cho from Kakao Brain
* hard-code alibi in fwd
* use params.h as hun_heads
* hard-code alibi in bwd
* add alibi on/off option
* compute alibi_start, ratio outside of kernels
* fix minor merge conflict
* add test_alibi.py
* change apply_alibi() location before masking
* add alibi in splitkv kernel
* fix backward func # of returns
* add out-of-bound check in apply_alibi()
* update test_alibi.py
* update test_alibi.py for kvcache
* simplify alibi parameter interface
* fix performance issue
by computing alibi outside of branch
* update test_flash_attn_varlen_func() for left padding
* implement alibi_slopes (b, nh) loading
* optimize apply_alibi() a bit
* update test cases for alibi_slopes loading
* reflect stylistic comments
* disable "seqlenq_ngroups_swapped" when using alibi
---------
Co-authored-by: monk.detective <monk.detective@kakaobrain.com>
|
2023-12-19 22:56:06 -08:00 |
|
Tri Dao
|
b4bf9cc1f3
|
Fix performance regression with causal
|
2023-11-26 19:07:25 -08:00 |
|
Driss Guessous
|
dc4b9ad6c4
|
add checks (#640)
|
2023-11-19 20:43:27 -08:00 |
|
Tri Dao
|
5a83425442
|
Change constexpr int to constexpr static int
|
2023-10-08 16:26:33 -07:00 |
|
Tri Dao
|
083e8f525f
|
Implement local attention
Co-authored-by: Timothee Lacroix <t@mistral.ai>
|
2023-09-26 16:31:08 -07:00 |
|
Tri Dao
|
1879e089c7
|
Reduce number of templates for headdim > 128
|
2023-09-23 22:24:30 -07:00 |
|
Tri Dao
|
2d8ea9a530
|
Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza)
|
2023-09-20 23:38:22 -07:00 |
|
Tri Dao
|
43617deab9
|
Remove template for (IsEvenMN=T, IsEvenK=F) to speed up compilation
|
2023-09-18 12:21:36 -07:00 |
|
Tri Dao
|
6a89b2f121
|
Remove constexpr in launch template to fix CI compilation
|
2023-09-03 22:59:41 -07:00 |
|
Sophia Wisdom
|
37e32febba
|
Remove commented out code in bwd (#512)
* Remove lots of comments
* Remove unused traits
|
2023-09-01 16:43:58 -07:00 |
|
Tri Dao
|
b1fbbd8337
|
Implement splitKV attention
|
2023-08-29 00:58:29 -07:00 |
|
Tri Dao
|
a4f148b6ab
|
Fix masking of bwd when seqlen is not divisible by 128
|
2023-07-31 17:46:34 -07:00 |
|
Tri Dao
|
4f285b3547
|
FlashAttention-2 release
|
2023-07-17 06:21:34 -07:00 |
|