cutlass/examples/41_fused_multi_head_attention/gemm
dan_the_3rd 9b8166e3f0
fMHA: Add backward pass (#844)
* fMHA: Add backward pass

* Better checks for strides/alignments

* Remove fb-internal URL

* torch.Tensor.untyped_storage requires pytorch 2.0+

* minor changes

* make test

---------

Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-04-06 20:44:58 -04:00
..
custom_mma_base.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
custom_mma_multistage.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
custom_mma_pipelined.h Fix typos 2 (#842) 2023-03-09 23:22:56 -05:00
custom_mma.h New updates for 2.11 (#775) 2023-01-20 16:32:57 -05:00
find_default_mma.h fMHA: Sync FW with xFormers (#828) 2023-02-22 23:25:31 -05:00
mma_accum_lambda_iterator.h fMHA: Sync FW with xFormers (#828) 2023-02-22 23:25:31 -05:00
mma_from_smem.h fMHA: Add backward pass (#844) 2023-04-06 20:44:58 -04:00