Bump to v2.7.0

This commit is contained in:
Tri Dao 2024-11-12 14:11:44 -08:00
parent 6ffeb572b1
commit c555642172
2 changed files with 5 additions and 1 deletions

View File

@ -373,6 +373,10 @@ Thanks to @beginlner for this contribution.
Support attention with softcapping, as used in Gemma-2 and Grok models.
Thanks to @Narsil and @lucidrains for this contribution.
### 2.7: Compatibility with torch compile
Thanks to @ani300 for this contribution.
## Performance
We present expected speedup (combined forward + backward pass) and memory savings from using FlashAttention against PyTorch standard attention, depending on sequence length, on different GPUs (speedup depends on memory bandwidth - we see more speedup on slower GPU memory).

View File

@ -1,4 +1,4 @@
__version__ = "2.6.3"
__version__ = "2.7.0"
from flash_attn.flash_attn_interface import (
flash_attn_func,