diff --git a/README.md b/README.md
index 8bfc85c..4435de1 100644
--- a/README.md
+++ b/README.md
@@ -77,7 +77,8 @@ As a result, FlashAttention can scale to much longer sequence lengths.
 
 We show speedup with head dimension 128.
 Here we show batch size 16 with 12 heads.
-Speedup is less than with the smaller head sizes, but speedup is still significant -- especially with a causal mask.
+Speedup is less than with the smaller head sizes, since we have to make the block size smaller in the tiling.
+But speedup is still significant, especially with a causal mask.
 
 ### RTX 3090