README typo

This commit is contained in:
Dan Fu 2022-05-27 22:38:20 +01:00
parent dc6d130088
commit 4decc3c166

View File

@ -35,6 +35,7 @@ We display FlashAttention speedup using these parameters (similar to BERT-base):
* Batch size 8 * Batch size 8
* Head dimension 64 * Head dimension 64
* 12 attention heads * 12 attention heads
Our graphs show sequence lengths between 128 and 4096 (when standard attention runs out of memory on an A100), but FlashAttention can scale up to sequence length 64K. Our graphs show sequence lengths between 128 and 4096 (when standard attention runs out of memory on an A100), but FlashAttention can scale up to sequence length 64K.
#### Speedup #### Speedup