Kai Londenberg
284e2c6e5b
Make FA3 paged attention ready for upgrade to Cutlass 3.6 ( #1331 )
2024-11-12 11:31:37 -08:00
Kai Londenberg
b443207c1f
Paged Attention support for FA3 ( #1268 )
2024-11-09 17:05:01 -08:00
jayhshah
a5a75274bc
FA3 kvcache + split kv + gqa parallelization ( #1236 )
2024-10-15 00:21:22 -07:00
Ying Zhang
dff976a84a
fixes
2024-09-16 15:44:44 -07:00
Ying Zhang
7b4e68e04f
hopper local attention
2024-09-16 14:59:22 -07:00
Ying Zhang
db80387343
Add seqused_q in fwd / bwd and seqused_k in bwd.
2024-09-16 14:24:11 -07:00
Charlene Yang
bdf733be55
Add q, k, v descales to FA3 interface ( #1210 )
...
* add descale_q/k/v for fp8 fwd
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
* fix .apply args
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
---------
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
2024-09-09 21:53:52 -07:00
jayhshah
c92ca63268
FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG ( #1173 )
2024-08-25 12:18:04 -07:00
Tri Dao
bafe253042
[FA3] Bwd
2024-08-01 01:57:06 -07:00
Tri Dao
3aae9c18c1
Revert "Changes For FP8 ( #1075 )"
...
This reverts commit 1899c970c8 .
2024-07-25 01:28:44 -07:00
ganeshcolfax
1899c970c8
Changes For FP8 ( #1075 )
...
* adding files for fp8 changes.
* removed contiguous check.
* enable all tests except odd-seq-lengths, where it crashes now.
* undid clang formatting.
* change to correct tile size for headdim=128.
* fixed odd-seq-len-k.
* minor formatting.
* minor reformatting.
---------
Co-authored-by: Tri Dao <tridao@users.noreply.github.com>
2024-07-23 13:51:14 -07:00
Ying Zhang
dfe1a59e4b
Add var-seq-len to FA3 fp16 / bf16 fwd ( #1072 )
...
* fwd var-seq-len
* fixes
* benchmark
* fixes
---------
Co-authored-by: Tri Dao <tridao@users.noreply.github.com>
2024-07-22 21:32:41 -07:00
youkaichao
ef3e358a25
remove lambda ( #1056 )
2024-07-21 23:24:38 -07:00
Tri Dao
7f67966cc7
FA3 initial code release
2024-07-11 09:53:36 -07:00