Ying Zhang
|
db80387343
|
Add seqused_q in fwd / bwd and seqused_k in bwd.
|
2024-09-16 14:24:11 -07:00 |
|
jayhshah
|
c92ca63268
|
FA3 FP8 qkv descales + restore max offset for h128 causal + added sync for producer WG (#1173)
|
2024-08-25 12:18:04 -07:00 |
|
Tri Dao
|
bafe253042
|
[FA3] Bwd
|
2024-08-01 01:57:06 -07:00 |
|
Ying Zhang
|
dfe1a59e4b
|
Add var-seq-len to FA3 fp16 / bf16 fwd (#1072)
* fwd var-seq-len
* fixes
* benchmark
* fixes
---------
Co-authored-by: Tri Dao <tridao@users.noreply.github.com>
|
2024-07-22 21:32:41 -07:00 |
|
Cameron Shinn
|
cb516f855b
|
Remove torchlib dependency from cpp files (#1083)
|
2024-07-22 16:47:09 -07:00 |
|
Tri Dao
|
74b0761ff7
|
[FA3] BF16 forward
|
2024-07-14 23:39:46 -07:00 |
|
Tri Dao
|
7f67966cc7
|
FA3 initial code release
|
2024-07-11 09:53:36 -07:00 |
|