flash-attention/csrc
Tri Dao 1aa6d7d9b6 Rework dropout to decouple forward and backward
They don't have to have the same block size, number of threads, etc.
2022-10-21 12:04:27 -07:00
..
flash_attn Rework dropout to decouple forward and backward 2022-10-21 12:04:27 -07:00