flash-attention

History

Antoni Viros 83e41b3ca4 Add custom ops for compatibility with PT Compile (#1139 ) * Add custom ops for compatibility with PT Compile * Add support for varlen functions too * Add version checks for pytorch API * Fix PT compile interfaces so it works e2e * Make sure PT < 2.4 runs fine * Fix python mistake * Fix all the autograd magic issues * typo on head_dim * Fix deterministic test failures, remove unneeded detaches() * remove test requires_grad * Resolve all the pytorch versioning issues * C++ and python refactor to improve padding management for torch.compile() * Add improvements suggested by @anijain2305		2024-09-17 19:49:26 -07:00
..
composable_kernel@a9b170b541	Support page kvcache in AMD ROCm (#1198 )	2024-09-15 23:17:28 -07:00
cutlass@756c351b49	[FA3] BF16 forward	2024-07-14 23:39:46 -07:00
flash_attn	Add custom ops for compatibility with PT Compile (#1139 )	2024-09-17 19:49:26 -07:00
flash_attn_ck	Support page kvcache in AMD ROCm (#1198 )	2024-09-15 23:17:28 -07:00
ft_attention	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
fused_dense_lib	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
fused_softmax	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
layer_norm	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
rotary	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
xentropy	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00