flash-attention

Author	SHA1	Message	Date
Tri Dao	dca6d89da4	Don't support softcap and dropout at the same time These tests are failing so I'm just disabling this case for now	2024-07-10 11:23:12 -07:00
Tri Dao	908511b2b6	Split into more .cu files to speed up compilation	2024-07-10 00:24:04 -07:00
Tri Dao	1d536d7de5	Minor cleanup of softcapping	2024-07-09 22:57:03 -07:00
Nicolas Patry	8f873cc6ac	Implement softcapping. (#1025 ) * Softcap v2 (fwd only). * Some missing interface + remove overrides in tests.	2024-07-08 11:24:48 -07:00
66RING	9486635c92	Fix typos of comments about shape. (#837 )	2024-06-30 22:40:59 -07:00
Liang	ab59ec3590	remove swizzle part of `sV.data()` to get a completely non-swizzle `sVtNoSwizzle` (#984 ) Co-authored-by: zl <zl@deepseek.com>	2024-06-30 22:38:44 -07:00
Grigory Sizov	f816dee63c	Support unpadded LSE layout (#970 ) * Support unpadded LSE layout. Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by: Jianyu Huang <hjyahead@gmail.com> * Cleanup * Fix unpadded LSE on split-kv path * Fix formatting and comments * Fix inline vs forceinline --------- Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by: Jianyu Huang <hjyahead@gmail.com>	2024-06-27 02:38:13 -07:00
Tri Dao	d732be1e67	Update to Cutlass 3.5	2024-05-26 12:49:33 -07:00
Tri Dao	656daef4ea	Use Cute's local_tile to get gQ, gK, gV	2024-04-07 20:10:19 -07:00
Driss Guessous	23e8fa5a26	Add the option for the macro and note (#893 )	2024-03-27 19:12:11 -07:00
ljss	3e9414f1c3	Minor fix in compute_attn_1rowblock_splitkv (#900 )	2024-03-27 19:11:45 -07:00
Driss Guessous	4a73e903da	Add in, macrosf for defining __grid_constant__ (#852 )	2024-03-15 00:48:54 -07:00
Tri Dao	2406f28805	Enable headdim 256 backward on consumer GPUs (Ampere, Ada)	2024-02-21 15:56:19 -08:00
Tri Dao	b32efb1a4d	Don't need to reduce row_sum during online softmax	2024-02-20 13:33:38 -08:00
Jeremy Reizenstein	0658e320f6	Preprocessor switches to control functionality (#788 ) For faster and smaller builds in some simple cases, provide switches to allow disabling -backward -alibi -uneven k -dropout -local attention Co-authored-by: Jeremy Francis Reizenstein <bottler@users.noreply.github.com>	2024-01-29 20:44:23 -08:00
Tri Dao	54e80a3829	Implement page KV cache Co-authored-by: ljss <450993438@qq.com>	2024-01-22 22:47:30 -08:00
Tri Dao	36bc29edf7	Use int64_t instead of uint32_t in kernel_traits.h	2024-01-22 22:39:29 -08:00
Tri Dao	000b67f5d8	Use int64_t instead of uint32_t for index_t	2024-01-22 11:25:50 -08:00
Tri Dao	ea8a25ca38	Remove configure in bwd kernel launch	2024-01-21 15:28:33 -08:00
Tri Dao	8f4d82cf5e	Update cutlass to v3.4.0	2024-01-20 22:30:06 -08:00
Tri Dao	395e5a0dba	Move rotary device functions to a separate file	2024-01-20 18:01:18 -08:00
Tri Dao	3e2c827d9a	Remove unused kernel_traits file	2024-01-20 17:41:44 -08:00
Tri Dao	66a127aef8	Refactor masking in fwd pass into 1 object	2024-01-20 17:39:53 -08:00
Tri Dao	ed4959b2eb	Change inline to __forceinline__, use __grid_constant__ param	2024-01-20 17:38:47 -08:00
Tri Dao	6f706eff96	Make Softmax an object	2024-01-19 16:09:31 -08:00
Tri Dao	4ea866ca19	Make Alibi an object	2024-01-15 00:07:11 -08:00
Tri Dao	5aca153d6d	Move bwd preprocess kernels to a separate file	2024-01-14 16:57:03 -08:00
Tri Dao	df1418f9db	Move softmax_rescale_o to softmax.h	2024-01-14 15:06:06 -08:00
Tri Dao	6777336a1c	Move masking to a separate file (mask.h)	2024-01-14 12:43:47 -08:00
Tri Dao	9448264ddd	Remove seqq_parallel backward kernel that's not used	2024-01-14 12:25:49 -08:00
Tri Dao	1274ec3e7e	Move dropout to a separate file (dropout.h)	2024-01-14 12:19:17 -08:00
Tri Dao	10dad61277	apply_dropout now takes tensor of rowcol layout	2024-01-14 01:03:23 -08:00
Tri Dao	d9cbcfb41c	Remove dead code in philox.cuh	2024-01-13 02:02:03 -08:00
Tri Dao	a7b66ae25a	Simplify writing softmax to gmem	2024-01-13 00:25:04 -08:00
Tri Dao	8d1b169ed1	Simplify SmemLayoutVtransposed in kernel_traits.h	2024-01-12 11:53:29 -08:00
Tri Dao	732654583c	Implement deterministic backward (thanks to Meituan)	2023-12-23 17:57:36 -08:00
Tri Dao	5ab9b3667b	Clean up alibi, implement non-causal alibi	2023-12-21 22:27:40 -08:00
Sanghun Cho	e4f726fc44	Support alibi, by Sanghun Cho from Kakao Brain * hard-code alibi in fwd * use params.h as hun_heads * hard-code alibi in bwd * add alibi on/off option * compute alibi_start, ratio outside of kernels * fix minor merge conflict * add test_alibi.py * change apply_alibi() location before masking * add alibi in splitkv kernel * fix backward func # of returns * add out-of-bound check in apply_alibi() * update test_alibi.py * update test_alibi.py for kvcache * simplify alibi parameter interface * fix performance issue by computing alibi outside of branch * update test_flash_attn_varlen_func() for left padding * implement alibi_slopes (b, nh) loading * optimize apply_alibi() a bit * update test cases for alibi_slopes loading * reflect stylistic comments * disable "seqlenq_ngroups_swapped" when using alibi --------- Co-authored-by: monk.detective <monk.detective@kakaobrain.com>	2023-12-19 22:56:06 -08:00
Jeremy Reizenstein	ce3e7280f8	Allow varlen_fwd to take optional seqused_k (#647 ) Co-authored-by: bottler <bottler@users.noreply.github.com>	2023-11-27 00:41:23 -08:00
Tri Dao	b4bf9cc1f3	Fix performance regression with causal	2023-11-26 19:07:25 -08:00
Tri Dao	db2f80692c	Write zero to out / grad if seqlen_q or seqlen_k is zero	2023-11-19 22:20:01 -08:00
Driss Guessous	dc4b9ad6c4	add checks (#640 )	2023-11-19 20:43:27 -08:00
Tri Dao	5a83425442	Change constexpr int to constexpr static int	2023-10-08 16:26:33 -07:00
Tri Dao	e279bf8ed9	[Gen] Accept cache_batch_idx to index into the KV cache	2023-10-03 16:27:26 -07:00
Tri Dao	083e8f525f	Implement local attention Co-authored-by: Timothee Lacroix <t@mistral.ai>	2023-09-26 16:31:08 -07:00
Tri Dao	65c234ed90	Don't over-allocate dq_accum in case of varlen	2023-09-24 00:36:07 -07:00
Tri Dao	1879e089c7	Reduce number of templates for headdim > 128	2023-09-23 22:24:30 -07:00
Tri Dao	2d8ea9a530	Swap seqlen_q and ngroups when seqlen_q=1 (h/t Daniel Haziza)	2023-09-20 23:38:22 -07:00
Tri Dao	43617deab9	Remove template for (IsEvenMN=T, IsEvenK=F) to speed up compilation	2023-09-18 12:21:36 -07:00
Tri Dao	c984208ddb	Set block size to 64 x 64 for kvcache to avoid nvcc segfaults	2023-09-17 16:14:58 -07:00

1 2 3

120 Commits