flash-attention

Author	SHA1	Message	Date
Tri Dao	395e5a0dba	Move rotary device functions to a separate file	2024-01-20 18:01:18 -08:00
Tri Dao	3e2c827d9a	Remove unused kernel_traits file	2024-01-20 17:41:44 -08:00
Tri Dao	66a127aef8	Refactor masking in fwd pass into 1 object	2024-01-20 17:39:53 -08:00
Tri Dao	ed4959b2eb	Change inline to __forceinline__, use __grid_constant__ param	2024-01-20 17:38:47 -08:00
Tri Dao	6f706eff96	Make Softmax an object	2024-01-19 16:09:31 -08:00
Tri Dao	4ea866ca19	Make Alibi an object	2024-01-15 00:07:11 -08:00
Tri Dao	5aca153d6d	Move bwd preprocess kernels to a separate file	2024-01-14 16:57:03 -08:00
Tri Dao	df1418f9db	Move softmax_rescale_o to softmax.h	2024-01-14 15:06:06 -08:00
Tri Dao	6777336a1c	Move masking to a separate file (mask.h)	2024-01-14 12:43:47 -08:00
Tri Dao	9448264ddd	Remove seqq_parallel backward kernel that's not used	2024-01-14 12:25:49 -08:00
Tri Dao	1274ec3e7e	Move dropout to a separate file (dropout.h)	2024-01-14 12:19:17 -08:00
Tri Dao	10dad61277	apply_dropout now takes tensor of rowcol layout	2024-01-14 01:03:23 -08:00
Tri Dao	d9cbcfb41c	Remove dead code in philox.cuh	2024-01-13 02:02:03 -08:00
Tri Dao	a7b66ae25a	Simplify writing softmax to gmem	2024-01-13 00:25:04 -08:00
Tri Dao	8d1b169ed1	Simplify SmemLayoutVtransposed in kernel_traits.h	2024-01-12 11:53:29 -08:00
Tri Dao	c9861a032d	[LayerNorm] Initialize mean and rstd tensor using x.device	2024-01-09 16:30:31 -08:00
Erich Schubert	99ea4baa1d	Typo in README (#760 )	2024-01-08 09:59:00 -08:00
Tri Dao	abbc131173	[LayerNorm] Switch from CUDA to Triton implementation	2024-01-05 00:31:17 -08:00
Tri Dao	f5b308e258	[LayerNorm] Rename layernorm.py -> layer_norm.py	2024-01-05 00:21:03 -08:00
Tri Dao	665b55e2e2	[LayerNorm] Implement parallel layer norm in Triton	2024-01-04 23:15:35 -08:00
Tri Dao	aa5c6438c5	[LayerNorm] Implement rowscale in Triton layernorm	2024-01-04 01:07:03 -08:00
jiaxingli	386e391117	Fix: implement deterministic backward in mha (#748 ) * fix deterministic * fix deterministic	2024-01-02 18:13:56 -08:00
Tri Dao	1a2c3e8c25	Bump to v2.4.2	2023-12-25 16:28:57 -08:00
Tri Dao	73df3be7d5	Add test for BTLM init	2023-12-25 15:16:27 -08:00
Tri Dao	7ffba9a501	Implement BTLM model	2023-12-24 20:35:12 -08:00
Tri Dao	2e29dacf0c	Implement muParam	2023-12-24 20:34:48 -08:00
Tri Dao	3f7d5786ba	Pass alibi slopes to flash_attn_with_kvcache during generation	2023-12-24 20:31:59 -08:00
Tri Dao	f844852485	Bump to v2.4.1	2023-12-23 21:00:39 -08:00
Tri Dao	0842ec0da4	Don't dispatch to local if window size >= seqlen_k	2023-12-23 20:59:26 -08:00
Tri Dao	732654583c	Implement deterministic backward (thanks to Meituan)	2023-12-23 17:57:36 -08:00
Tri Dao	2c7d7b7396	Implement norm head for Baichuan2	2023-12-22 16:55:40 -08:00
Tri Dao	68f178aa4b	[CI] Don't compile for python 3.7 pytorch 2.2	2023-12-22 10:10:02 -08:00
Tri Dao	7316277303	Bump to v2.4.0	2023-12-22 00:09:53 -08:00
Tri Dao	50d144c906	Mention Alibi in README	2023-12-21 23:48:16 -08:00
Tri Dao	8448c02889	Update cutlass to v3.3.0	2023-12-21 23:25:50 -08:00
Tri Dao	c3b2196652	Add Alibi to MHA, test with Baichuan-13B	2023-12-21 22:49:55 -08:00
Tri Dao	701b51bfc3	[CI] Use torch-nightly 20231106 instead of 20231127	2023-12-21 22:28:09 -08:00
Tri Dao	5ab9b3667b	Clean up alibi, implement non-causal alibi	2023-12-21 22:27:40 -08:00
Tri Dao	bc28eacc60	Format flash_attn_interface.py	2023-12-19 23:13:53 -08:00
Tri Dao	0a146185d6	[Gen] Remove minor dead code	2023-12-19 22:57:39 -08:00
Sanghun Cho	e4f726fc44	Support alibi, by Sanghun Cho from Kakao Brain * hard-code alibi in fwd * use params.h as hun_heads * hard-code alibi in bwd * add alibi on/off option * compute alibi_start, ratio outside of kernels * fix minor merge conflict * add test_alibi.py * change apply_alibi() location before masking * add alibi in splitkv kernel * fix backward func # of returns * add out-of-bound check in apply_alibi() * update test_alibi.py * update test_alibi.py for kvcache * simplify alibi parameter interface * fix performance issue by computing alibi outside of branch * update test_flash_attn_varlen_func() for left padding * implement alibi_slopes (b, nh) loading * optimize apply_alibi() a bit * update test cases for alibi_slopes loading * reflect stylistic comments * disable "seqlenq_ngroups_swapped" when using alibi --------- Co-authored-by: monk.detective <monk.detective@kakaobrain.com>	2023-12-19 22:56:06 -08:00
Tri Dao	cd089597fd	[LayerNorm] Implement dropout in fused residual + LN/RMSNorm	2023-12-19 16:26:07 -08:00
Tri Dao	713bd3aa9a	[CrossEntropy] Test longer sequences	2023-12-16 19:11:23 -08:00
Tri Dao	08124c8f9c	[CrossEntropy] Implement logit_scale option	2023-12-16 18:39:37 -08:00
Tri Dao	9356a1c038	[LayerNorm] Implement layer_norm_linear	2023-11-30 21:46:07 -08:00
Tri Dao	92dd5703ec	Bump to v2.3.6	2023-11-27 16:23:39 -08:00
Tri Dao	d4a7c8ffbb	[CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly	2023-11-27 16:21:28 -08:00
Jeremy Reizenstein	ce3e7280f8	Allow varlen_fwd to take optional seqused_k (#647 ) Co-authored-by: bottler <bottler@users.noreply.github.com>	2023-11-27 00:41:23 -08:00
Tri Dao	23b77c8148	Bump to v2.3.5	2023-11-26 19:08:28 -08:00
Tri Dao	b4bf9cc1f3	Fix performance regression with causal	2023-11-26 19:07:25 -08:00

1 2 3 4 5 ...

553 Commits