From 40a25c8ee7465cf547b929cfa2937034e37bfce9 Mon Sep 17 00:00:00 2001 From: Tri Dao Date: Wed, 17 May 2023 08:32:26 -0700 Subject: [PATCH] Update roadmap --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index ccc022b..70529b5 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,10 @@ As Triton is a higher-level language than CUDA, it might be easier to understand and experiment with. The notations in the Triton implementation are also closer to what's used in our paper. +We also have an experimental implementation in Triton that support attention +bias (e.g. ALiBi): +https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/flash_attn_triton.py + ## Installation and features @@ -76,10 +80,6 @@ Our tentative roadmap: 6. ~~[Jul 2022] Support head dimension 128~~[Done]. 7. ~~[Aug 2022] Fuse rotary embedding~~[Done]. 8. ~~[Mar 2023] Support SM90 GPUs (H100)~~[Done]. -9. [Apr 2023] Refactor to use Cutlass 3.x. -10. [May 2023] Support attention bias (e.g. ALiBi, relative positional encoding). -11. [Jun 2023] Support SM70 GPUs (V100). -12. [Jun 2023] Support fp8 (H100). ## How to use FlashAttention