flash-attention/csrc/stream_attn/README.md
2022-05-20 14:21:58 -07:00

298 B

Our implementation uses Apex's FMHA code as a starting point.

We thank Young-jun Ko for the in-depth explanation of his FMHA implementation and for his thoughtful answers to our questions about CUDA.