flash-attention

squall/flash-attention

Fork 0

7153673c1a

Fix swiglu backwards return type (#1337) main Neil Tenenholtz 2024-11-15 19:23:40 -0500
641db759ab [CI] Pytorch 2.5.1 does not support python 3.8 Tri Dao 2024-11-12 20:01:43 -0800
7435839e3d Update README for FA3 Tri Dao 2024-11-12 20:01:07 -0800
241c682c9f [CI] Switch back to CUDA 12.4 Tri Dao 2024-11-12 14:24:27 -0800
c555642172 Bump to v2.7.0 Tri Dao 2024-11-12 14:11:44 -0800
6ffeb572b1 [CI] Still use CUDA 12.3 but pull the right pytorch version Tri Dao 2024-11-12 14:04:30 -0800
42f2b8be34

Use CUDA 12.4 in the build system (#1326) Ethan Steinberg 2024-11-12 13:40:38 -0800
2f6c633179 Drop support for Pytorch 2.0 Tri Dao 2024-11-12 11:58:16 -0800
88d1657a14

[AMD ROCm] Fix KVcache bug and improve performance (#1328) rocking 2024-11-13 03:32:11 +0800
284e2c6e5b

Make FA3 paged attention ready for upgrade to Cutlass 3.6 (#1331) Kai Londenberg 2024-11-12 14:31:37 -0500
b443207c1f

Paged Attention support for FA3 (#1268) Kai Londenberg 2024-11-10 02:05:01 +0100
f0bf3ed9ab

Feat: Add support for PyTorch 2.5 in workflows (#1284) NanoCode012 2024-11-07 15:37:56 +0700
478ee666cc

Make namespace comment consistent (#1305) Son Nguyen 2024-10-31 13:32:49 +0800
c1d146cbd5

Fix copy-paste error in hopper tests (#1279) milesvant 2024-10-15 13:54:40 -0700
a5a75274bc

FA3 kvcache + split kv + gqa parallelization (#1236) jayhshah 2024-10-15 00:21:22 -0700
bedf877467 [CrossEntropy] Fix where labels address not aligned to 16 bytes Tri Dao 2024-10-05 02:02:24 -0700
53a4f34163

Hotfix due to change of upstream api (#1239) rocking 2024-09-21 03:45:25 +0800
8476986721

Fix FAv3 compilation with MSVC (#1240) hlky 2024-09-20 20:44:59 +0100
9cafd4ae14

Merge pull request #1233 from Dao-AILab/ipiszy/local_attn Ying Zhang 2024-09-19 23:14:45 -0700
1c9717d699 address comments Ying Zhang 2024-09-19 22:00:41 -0700
30e1ef0f79

minify torch.torch.int32 to torch.int32 (#1237) Zhihao Shen 2024-09-18 15:32:59 +0800
83e41b3ca4

Add custom ops for compatibility with PT Compile (#1139) Antoni Viros 2024-09-17 19:49:26 -0700
be6c1b98c4 small fixes Ying Zhang 2024-09-16 15:50:55 -0700
dff976a84a fixes Ying Zhang 2024-08-30 17:20:18 -0700
7b4e68e04f hopper local attention Ying Zhang 2024-08-29 22:34:31 -0700
af314d4006

Merge pull request #1182 from ipiszy/used_q Ying Zhang 2024-09-16 14:57:19 -0700
8cbc8a042f small fixes Ying Zhang 2024-09-16 14:38:43 -0700
cdbbe844b1 minor changes to unpad_input test util func Ying Zhang 2024-09-13 17:10:37 -0700
db80387343 Add seqused_q in fwd / bwd and seqused_k in bwd. Ying Zhang 2024-08-27 21:41:21 -0700
e2182cc21d

Support page kvcache in AMD ROCm (#1198) rocking 2024-09-16 14:17:28 +0800
cc1690d9d6 [Rotary] Add test for rotary when qkv are packed an there's GQA Tri Dao 2024-09-12 22:35:20 -0700
8c20cfef49 [Rotary] Support qkv block layout from GQA Tri Dao 2024-09-11 10:39:18 -0700
bdf733be55

Add q, k, v descales to FA3 interface (#1210) Charlene Yang 2024-09-09 21:53:52 -0700
c7f32a8409 [CrossEntropy] Support precomputed LSE Tri Dao 2024-09-08 09:24:18 -0700
e371bea04f

feat: change minimal supported CUDA version to 11.7 (#1206) juejuezi 2024-09-06 01:34:35 +0800
3cea2fb6ee

Add ArchTag to pre/postprocess bwd kernels (#1180) Cameron Shinn 2024-08-28 00:20:47 -0700
c92ca63268

FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173) jayhshah 2024-08-25 12:18:04 -0700
d79f9b41a8 [CrossEntropy] Use online softmax to simplify implementation Tri Dao 2024-08-24 17:39:57 -0700
32792d37ec add missing if condition for key_padding_mask in test_util.py Jay Shah 2024-08-19 11:17:17 -0700
28e7f4ddbd

Merge pull request #1155 from ipiszy/fix Ying Zhang 2024-08-17 13:34:06 -0700
53537da422 add a unittest Ying Zhang 2024-08-17 13:23:50 -0700
a3a257c71d Fix out-of-bound writes for var-seq-len zero-length KVs Ying Zhang 2024-08-16 01:13:35 -0700
bcd918f275 [LayerNorm] Add option to write result to out and residual_out Tri Dao 2024-08-15 14:43:47 -0700
bd82d6c6eb Revert "[LayerNorm] Don't store x + residual if we don't need gradients" Tri Dao 2024-08-15 12:02:39 -0700
800401847e [LayerNorm] Don't store x + residual if we don't need gradients Tri Dao 2024-08-15 11:07:46 -0700
16025d8cc9

Clearer install instructions for CUDA and ROCm backends (#1147) Garrett Byrd 2024-08-14 01:21:56 -0400
3669b25206

bwd benchmark + small fixes (#1129) Ying Zhang 2024-08-05 21:27:52 -0700
5d5bfbb619 Remove contiguous checks Tri Dao 2024-08-05 14:46:46 -0700
3f1b4d38e7

Fix: check the type of max_seqlen_k instead of checking max_seqlen twice (#1127) SueJane 2024-08-05 23:59:23 +0800
3f6ff1c1c5 Remove struct : cute::aligned_struct to avoid error with gcc 12 Tri Dao 2024-08-02 00:59:35 -0700
c33de664a1 Fix import in test Tri Dao 2024-08-01 02:14:25 -0700
bafe253042 [FA3] Bwd Tri Dao 2024-08-01 01:57:06 -0700
abffb0f98c

Merge pull request #1115 from ipiszy/bench Ying Zhang 2024-07-31 22:42:06 -0700
c7f20a2d31 add cudnn benchmark for var-len Ying Zhang 2024-07-31 22:33:29 -0700
5018ac6ac5

Fp8 kernel with "in-kernel" transpose of V in producer (#1100) jayhshah 2024-07-30 14:14:14 -0700
c4b9015d74 Add benchmark_gemm.py Tri Dao 2024-07-27 11:13:18 -0700
418d677192 Bump to v2.6.3 Tri Dao 2024-07-25 01:31:28 -0700
65205d350e [CI] Compile for pytorch 2.4.0 Tri Dao 2024-07-25 01:30:34 -0700
3aae9c18c1 Revert "Changes For FP8 (#1075)" Tri Dao 2024-07-25 01:28:44 -0700
1899c970c8

Changes For FP8 (#1075) ganeshcolfax 2024-07-23 13:51:14 -0700
59594f2a67 Bump to v2.6.2 Tri Dao 2024-07-23 02:30:05 -0700
299563626f Fix test with alibi and cache_leftpad Tri Dao 2024-07-23 02:04:15 -0700
4488acee8d [CI] Compile with torch 2.4.0.dev20240527 Tri Dao 2024-07-23 01:33:32 -0700
65f723bb9a Split bwd into more .cu files to speed up compilation Tri Dao 2024-07-23 01:32:09 -0700
5ca83a9c71 Clean up softcapping bwd a bit Tri Dao 2024-07-22 23:42:06 -0700
751c762c9c Don't specialize for hdim 224 to speed up compilation Tri Dao 2024-07-22 23:40:31 -0700
1c275eb070

Fix ima for split-kv kernel (#1085) Driss Guessous 2024-07-22 22:19:46 -0700
3c4053b75c

Make FA3 externally importable (#1053) janEbert 2024-07-23 04:34:56 +0000
d8f104e97a

Support AMD ROCm on FlashAttention 2 (#1010) rocking 2024-07-23 12:34:37 +0800
dfe1a59e4b

Add var-seq-len to FA3 fp16 / bf16 fwd (#1072) Ying Zhang 2024-07-22 21:32:41 -0700
cb516f855b

Remove torchlib dependency from cpp files (#1083) Cameron Shinn 2024-07-22 16:47:09 -0700
5f1ae4a34b

backwards for softcapping (#1033) Phil Wang 2024-07-21 23:25:46 -0700
ef3e358a25

remove lambda (#1056) youkaichao 2024-07-21 23:24:38 -0700
4df62e1440

catch typo (#1058) Jorge António 2024-07-22 07:24:15 +0100
74b0761ff7 [FA3] BF16 forward Tri Dao 2024-07-14 23:39:46 -0700
898dd4bbf2 Pass seqused_k to _flash_attn_varlen_forward Tri Dao 2024-07-13 00:08:27 -0700
7ef24848cf Add FA3 image Tri Dao 2024-07-11 09:54:05 -0700
7f67966cc7 FA3 initial code release Tri Dao 2024-07-11 09:53:36 -0700
b4a9dd6c9c Temporarily switch to cutlass fork for more shapes Tri Dao 2024-07-11 09:29:21 -0700
7551202cb2 Bump to v2.6.1 Tri Dao 2024-07-11 08:28:32 -0700
844912dca0 [CI] Switch from CUDA 12.2 to 12.3 Tri Dao 2024-07-11 08:20:09 -0700
40e534a7f6 Implement cache_leftpad Tri Dao 2024-07-11 08:17:15 -0700
116b05f9b0 [CI] Compile with pytorch 2.4.0.dev20240514 Tri Dao 2024-07-11 02:53:30 -0700
da11d1b853 Bump v2.6.0 Tri Dao 2024-07-10 21:34:58 -0700
d0787acc16 Relax dropout_fraction test Tri Dao 2024-07-10 11:49:40 -0700
dca6d89da4 Don't support softcap and dropout at the same time Tri Dao 2024-07-10 11:23:12 -0700
81e01efd4b More typo fixes Tri Dao 2024-07-10 10:19:17 -0700
72e27c6320 Fix typo with softcapping Tri Dao 2024-07-10 00:33:52 -0700
3d41db3e2c Only test backward if there's no softcapping Tri Dao 2024-07-10 00:27:45 -0700
908511b2b6 Split into more .cu files to speed up compilation Tri Dao 2024-07-10 00:24:04 -0700
1d536d7de5 Minor cleanup of softcapping Tri Dao 2024-07-09 22:57:03 -0700
beb2bf2a32 Drop support for pytorch 1.12, 1.13, and python 3.7 Tri Dao 2024-07-09 22:13:15 -0700
f4628b43ec

missing commas and backwards return arguments (#1032) Phil Wang 2024-07-09 10:56:29 -0700
8f873cc6ac

Implement softcapping. (#1025) Nicolas Patry 2024-07-08 20:24:48 +0200
4e8d60069f

Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989) Jianwei Dong 2024-07-08 23:29:40 +0800
6df7e0a02e

Fix the varlen deterministic test (#1023) muoshuosha 2024-07-04 02:07:57 +0800
9486635c92

Fix typos of comments about shape. (#837) 66RING 2024-07-01 13:40:59 +0800
0d810cfb73

Fix KeyError handling for non-existing key in state_dict.pop() (#898) JDKWangGuan 2024-06-30 22:40:03 -0700
6a2a16e994

fix typo (#974) cao lei 2024-06-30 22:39:39 -0700
5bf201966a

Fixing argument checking when using seqlenq_ngroups_swapped. (#976) Nicolas Patry 2024-07-01 07:39:22 +0200

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main