diff --git a/README.md b/README.md index 9e996bf..c092366 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ Paper: https://arxiv.org/abs/2205.14135 ## Alpha release (0.1). -To compile (requiring CUDA 11, NVCC, and an Ampere GPU): +To compile (requiring CUDA 11, NVCC, and an Turing or Ampere GPU): ``` python setup.py install ``` @@ -22,7 +22,7 @@ PYTHONPATH=$PWD python benchmarks/benchmark_flash_attention.py ``` FlashAttention currently supports: -1. Ampere GPUs (e.g., A100, RTX 3090). +1. Turing or Ampere GPUs (e.g., A100, RTX 3090, T4, RTX 2080). 2. fp16. 3. Head dimensions 16, 32, 64. @@ -30,7 +30,7 @@ Our tentative roadmap: 1. [Jun 2022] Make package pip-installable. 2. ~~[Jun 2022] Support SM86 GPUs (e.g., RTX 3080, 3090)~~[Done]. 3. [Jun 2022] Refactor to use Cutlass. -4. [Jun 2022] Support SM75 GPUs (e.g. T4). +4. ~~[Jun 2022] Support SM75 GPUs (e.g. T4)~~[Done]. 5. [Jun 2022] Support bf16. 6. [Jul 2022] Support head dimension 128. 7. [Jul 2022] Support SM70 GPUs (V100).