More explanation

This commit is contained in:
Dan Fu 2022-06-14 11:55:14 -07:00
parent 2d5b2483b8
commit 765741c1ee

View File

@ -77,7 +77,8 @@ As a result, FlashAttention can scale to much longer sequence lengths.
We show speedup with head dimension 128.
Here we show batch size 16 with 12 heads.
Speedup is less than with the smaller head sizes, but speedup is still significant -- especially with a causal mask.
Speedup is less than with the smaller head sizes, since we have to make the block size smaller in the tiling.
But speedup is still significant, especially with a causal mask.
### RTX 3090