* piercefreeman-feature/demo-wheels: (25 commits)
Install standard non-wheel package
Remove release creation
Build wheel on each push
Isolate 2.0.0 & cuda12
Clean setup.py imports
Remove builder project
Bump version
Add notes to github action workflow
Add torch dependency to final build
Exclude cuda erroring builds
Exclude additional disallowed matrix params
Full version matrix
Add CUDA 11.7
Release is actually unsupported
echo OS version
Temp disable deploy
OS version build numbers
Restore full build matrix
Refactor and clean of setup.py
Strip cuda name from torch version
...
* 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention: (25 commits)
Install standard non-wheel package
Remove release creation
Build wheel on each push
Isolate 2.0.0 & cuda12
Clean setup.py imports
Remove builder project
Bump version
Add notes to github action workflow
Add torch dependency to final build
Exclude cuda erroring builds
Exclude additional disallowed matrix params
Full version matrix
Add CUDA 11.7
Release is actually unsupported
echo OS version
Temp disable deploy
OS version build numbers
Restore full build matrix
Refactor and clean of setup.py
Strip cuda name from torch version
...
* Add RNG state to kernel launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Save seed and offset for backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Single thread write to global mem
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* compute_dq_dk_dv_1colblock get seed and offset from launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* compute_dq_dk_dv_1rowblock get seed and offset from launch params
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Change forward c++ APIs to save RNG state for backward
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Change backward c++ APIs to set RNG state for bprop launcher
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Bug fixes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Python side API changes
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Bug fix; only save seeds instead of full offset
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Account for 3D grid size
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>