From 40a25c8ee7465cf547b929cfa2937034e37bfce9 Mon Sep 17 00:00:00 2001
From: Tri Dao <tridpq@gmail.com>
Date: Wed, 17 May 2023 08:32:26 -0700
Subject: [PATCH] Update roadmap

---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index ccc022b..70529b5 100644
--- a/README.md
+++ b/README.md
@@ -37,6 +37,10 @@ As Triton is a higher-level language than CUDA, it might be easier to understand
 and experiment with. The notations in the Triton implementation are also closer
 to what's used in our paper.
 
+We also have an experimental implementation in Triton that support attention
+bias (e.g. ALiBi):
+https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/flash_attn_triton.py
+
 
 ## Installation and features
 
@@ -76,10 +80,6 @@ Our tentative roadmap:
 6. ~~[Jul 2022] Support head dimension 128~~[Done].
 7. ~~[Aug 2022] Fuse rotary embedding~~[Done].
 8. ~~[Mar 2023] Support SM90 GPUs (H100)~~[Done].
-9. [Apr 2023] Refactor to use Cutlass 3.x.
-10. [May 2023] Support attention bias (e.g. ALiBi, relative positional encoding).
-11. [Jun 2023] Support SM70 GPUs (V100).
-12. [Jun 2023] Support fp8 (H100).
 
 
 ## How to use FlashAttention