Speedup graph for A100, d128

2022-06-14 11:54:16 -07:00 · 2022-06-14 11:54:16 -07:00 · 2d5b2483b8
commit 2d5b2483b8
parent 5d07483bbc
2 changed files with 8 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -71,6 +71,14 @@ Memory savings are proportional to sequence length -- since standard attention h
 We see 10X memory savings at sequence length 2K, and 20X at 4K.
 As a result, FlashAttention can scale to much longer sequence lengths.

+#### Head Dimension 128
+
+![FlashAttention speedup, head dimension 128](assets/flashattn_speedup_a100_d128.jpg)
+
+We show speedup with head dimension 128.
+Here we show batch size 16 with 12 heads.
+Speedup is less than with the smaller head sizes, but speedup is still significant -- especially with a causal mask.
+
 ### RTX 3090

 For the RTX 3090, we use batch size 12 with 12 attention heads.
--- a/assets/flashattn_speedup_a100_d128.jpg
+++ b/assets/flashattn_speedup_a100_d128.jpg