Commit Graph

120 Commits

Author SHA1 Message Date
Tri Dao
dca6d89da4 Don't support softcap and dropout at the same time
These tests are failing so I'm just disabling this case for now
2024-07-10 11:23:12 -07:00
Tri Dao
908511b2b6 Split into more .cu files to speed up compilation 2024-07-10 00:24:04 -07:00
Tri Dao
1d536d7de5 Minor cleanup of softcapping 2024-07-09 22:57:03 -07:00
Nicolas Patry
8f873cc6ac
Implement softcapping. (#1025)
* Softcap v2 (fwd only).

* Some missing interface + remove overrides in tests.
2024-07-08 11:24:48 -07:00
66RING
9486635c92
Fix typos of comments about shape. (#837) 2024-06-30 22:40:59 -07:00
Liang
ab59ec3590
remove swizzle part of sV.data() to get a completely non-swizzle sVtNoSwizzle (#984)
Co-authored-by: zl <zl@deepseek.com>
2024-06-30 22:38:44 -07:00
Grigory Sizov
f816dee63c
Support unpadded LSE layout (#970)
* Support unpadded LSE layout.

Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

* Cleanup

* Fix unpadded LSE on split-kv path

* Fix formatting and comments

* Fix inline vs forceinline

---------

Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>
2024-06-27 02:38:13 -07:00
Tri Dao
d732be1e67 Update to Cutlass 3.5 2024-05-26 12:49:33 -07:00
Tri Dao
656daef4ea Use Cute's local_tile to get gQ, gK, gV 2024-04-07 20:10:19 -07:00
Driss Guessous
23e8fa5a26
Add the option for the macro and note (#893) 2024-03-27 19:12:11 -07:00
ljss
3e9414f1c3
Minor fix in compute_attn_1rowblock_splitkv (#900) 2024-03-27 19:11:45 -07:00
Driss Guessous
4a73e903da
Add in, macrosf for defining __grid_constant__ (#852) 2024-03-15 00:48:54 -07:00
Tri Dao
2406f28805 Enable headdim 256 backward on consumer GPUs (Ampere, Ada) 2024-02-21 15:56:19 -08:00
Tri Dao
b32efb1a4d Don't need to reduce row_sum during online softmax 2024-02-20 13:33:38 -08:00
Jeremy Reizenstein
0658e320f6
Preprocessor switches to control functionality (#788)
For faster and smaller builds in some simple cases,
provide switches to allow disabling
-backward
-alibi
-uneven k
-dropout
-local attention

Co-authored-by: Jeremy Francis Reizenstein <bottler@users.noreply.github.com>
2024-01-29 20:44:23 -08:00
Tri Dao
54e80a3829 Implement page KV cache
Co-authored-by: ljss <450993438@qq.com>
2024-01-22 22:47:30 -08:00
Tri Dao
36bc29edf7 Use int64_t instead of uint32_t in kernel_traits.h 2024-01-22 22:39:29 -08:00
Tri Dao
000b67f5d8 Use int64_t instead of uint32_t for index_t 2024-01-22 11:25:50 -08:00
Tri Dao
ea8a25ca38 Remove configure in bwd kernel launch 2024-01-21 15:28:33 -08:00
Tri Dao
8f4d82cf5e Update cutlass to v3.4.0 2024-01-20 22:30:06 -08:00
Tri Dao
395e5a0dba Move rotary device functions to a separate file 2024-01-20 18:01:18 -08:00
Tri Dao
3e2c827d9a Remove unused kernel_traits file 2024-01-20 17:41:44 -08:00
Tri Dao
66a127aef8 Refactor masking in fwd pass into 1 object 2024-01-20 17:39:53 -08:00
Tri Dao
ed4959b2eb Change inline to __forceinline__, use __grid_constant__ param 2024-01-20 17:38:47 -08:00
Tri Dao
6f706eff96 Make Softmax an object 2024-01-19 16:09:31 -08:00
Tri Dao
4ea866ca19 Make Alibi an object 2024-01-15 00:07:11 -08:00
Tri Dao
5aca153d6d Move bwd preprocess kernels to a separate file 2024-01-14 16:57:03 -08:00
Tri Dao
df1418f9db Move softmax_rescale_o to softmax.h 2024-01-14 15:06:06 -08:00
Tri Dao
6777336a1c Move masking to a separate file (mask.h) 2024-01-14 12:43:47 -08:00
Tri Dao
9448264ddd Remove seqq_parallel backward kernel that's not used 2024-01-14 12:25:49 -08:00
Tri Dao
1274ec3e7e Move dropout to a separate file (dropout.h) 2024-01-14 12:19:17 -08:00
Tri Dao
10dad61277 apply_dropout now takes tensor of rowcol layout 2024-01-14 01:03:23 -08:00
Tri Dao
d9cbcfb41c Remove dead code in philox.cuh 2024-01-13 02:02:03 -08:00
Tri Dao
a7b66ae25a Simplify writing softmax to gmem 2024-01-13 00:25:04 -08:00
Tri Dao
8d1b169ed1 Simplify SmemLayoutVtransposed in kernel_traits.h 2024-01-12 11:53:29 -08:00
Tri Dao
732654583c Implement deterministic backward (thanks to Meituan) 2023-12-23 17:57:36 -08:00
Tri Dao
5ab9b3667b Clean up alibi, implement non-causal alibi 2023-12-21 22:27:40 -08:00
Sanghun Cho
e4f726fc44
Support alibi, by Sanghun Cho from Kakao Brain
* hard-code alibi in fwd

* use params.h as hun_heads

* hard-code alibi in bwd

* add alibi on/off option

* compute alibi_start, ratio outside of kernels

* fix minor merge conflict

* add test_alibi.py

* change apply_alibi() location before masking

* add alibi in splitkv kernel

* fix backward func # of returns

* add out-of-bound check in apply_alibi()

* update test_alibi.py

* update test_alibi.py for kvcache

* simplify alibi parameter interface

* fix performance issue
by computing alibi outside of branch

* update test_flash_attn_varlen_func() for left padding

* implement alibi_slopes (b, nh) loading

* optimize apply_alibi() a bit

* update test cases for alibi_slopes loading

* reflect stylistic comments

* disable "seqlenq_ngroups_swapped" when using alibi

---------

Co-authored-by: monk.detective <monk.detective@kakaobrain.com>
2023-12-19 22:56:06 -08:00
Jeremy Reizenstein
ce3e7280f8
Allow varlen_fwd to take optional seqused_k (#647)
Co-authored-by: bottler <bottler@users.noreply.github.com>
2023-11-27 00:41:23 -08:00
Tri Dao
b4bf9cc1f3 Fix performance regression with causal 2023-11-26 19:07:25 -08:00
Tri Dao
db2f80692c Write zero to out / grad if seqlen_q or seqlen_k is zero 2023-11-19 22:20:01 -08:00
Driss Guessous
dc4b9ad6c4
add checks (#640) 2023-11-19 20:43:27 -08:00
Tri Dao
5a83425442 Change constexpr int to constexpr static int 2023-10-08 16:26:33 -07:00
Tri Dao
e279bf8ed9 [Gen] Accept cache_batch_idx to index into the KV cache 2023-10-03 16:27:26 -07:00
Tri Dao
083e8f525f Implement local attention
Co-authored-by: Timothee Lacroix <t@mistral.ai>
2023-09-26 16:31:08 -07:00
Tri Dao
65c234ed90 Don't over-allocate dq_accum in case of varlen 2023-09-24 00:36:07 -07:00
Tri Dao
1879e089c7 Reduce number of templates for headdim > 128 2023-09-23 22:24:30 -07:00
Tri Dao
2d8ea9a530 Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza) 2023-09-20 23:38:22 -07:00
Tri Dao
43617deab9 Remove template for (IsEvenMN=T, IsEvenK=F) to speed up compilation 2023-09-18 12:21:36 -07:00
Tri Dao
c984208ddb Set block size to 64 x 64 for kvcache to avoid nvcc segfaults 2023-09-17 16:14:58 -07:00