[library](https://www.mosaicml.com/blog/gpt-3-quality-for-500k). Composer is a
library for efficient neural network training.
## MLPerf benchmarks
[MLPerf](https://mlcommons.org/en/) is a competitive machine learning performance benchmark. FlashAttention
yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
2022) and MLPerf training 2.1 (November 2022).
- MLPerf 2.0: IEEE Spectrum [article](https://spectrum.ieee.org/mlperf-rankings-2022) about our submission to the MLPerf 2.0 benchmark using FlashAttention.
- MLPerf 2.1 -
collaboration
between [Azure and Hazy Research](https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-collaborates-with-hazy-research-and-nvidia-to-achieve/ba-p/3667511): for the first time, we can train MLPerf BERT
- Our own Stable Diffusion [fork](https://twitter.com/realDanFu/status/1580641495991754752) uses FlashAttention to get 3-4x speedup compared
to the original version.
## Other models
- [Uni-Fold](https://github.com/dptech-corp/Uni-Fold): Uni-Fold is an
open-source platform for developing protein models beyond AlphaFold. With
FlashAttention, Uni-Fold is 2.6x
[faster](https://twitter.com/guolin_ke/status/1580532071901995008) than AlphaFold.
## Different implementations
- [Triton](https://github.com/openai/triton): an [implementation](https://github.com/openai/triton/blob/master/python/tutorials/06-fused-attention.py) of
FlashAttention in Triton by Phil Tillet from OpenAI. Triton is a Python-based
language and compiler for parallel programming.
- [xformers](https://github.com/facebookresearch/xformers): The xformers team
has implemented [memory-efficient attention](https://twitter.com/fvsmassa/status/1580229170629849089) in a similar spirit to FlashAttention.
- [Jax](https://github.com/google/jax): an [implementation](https://github.com/lucidrains/flash-attention-jax)
in Jax by [lucidrains](https://github.com/lucidrains/).