cutlass/tools/library
dan_the_3rd 9b8166e3f0
fMHA: Add backward pass (#844)
* fMHA: Add backward pass

* Better checks for strides/alignments

* Remove fb-internal URL

* torch.Tensor.untyped_storage requires pytorch 2.0+

* minor changes

* make test

---------

Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-04-06 20:44:58 -04:00
..
include/cutlass/library Fix typos 2 (#842) 2023-03-09 23:22:56 -05:00
scripts Add tile_n=32 and tile_k=32 kernels in generator.py (#858) 2023-04-06 10:00:52 -04:00
src fix split_k_mode and add reduction kernel for f16 input/accum/output (#896) 2023-03-30 15:31:08 -04:00
CMakeLists.txt fMHA: Add backward pass (#844) 2023-04-06 20:44:58 -04:00