From 765741c1eeb86c96ee71a3291ad6968cfbf4e4a1 Mon Sep 17 00:00:00 2001 From: Dan Fu Date: Tue, 14 Jun 2022 11:55:14 -0700 Subject: [PATCH] More explanation --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8bfc85c..4435de1 100644 --- a/README.md +++ b/README.md @@ -77,7 +77,8 @@ As a result, FlashAttention can scale to much longer sequence lengths. We show speedup with head dimension 128. Here we show batch size 16 with 12 heads. -Speedup is less than with the smaller head sizes, but speedup is still significant -- especially with a causal mask. +Speedup is less than with the smaller head sizes, since we have to make the block size smaller in the tiling. +But speedup is still significant, especially with a causal mask. ### RTX 3090