flash-attention/csrc
Kirthi Shankar Sivamani a03f6f8e9e
Enable CUDA graphs (#386)
* Add RNG state to kernel launch params

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Save seed and offset for backward

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Single thread write to global mem

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* compute_dq_dk_dv_1colblock get seed and offset from launch params

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* compute_dq_dk_dv_1rowblock get seed and offset from launch params

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change forward c++ APIs to save RNG state for backward

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Change backward c++ APIs to set RNG state for bprop launcher

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Bug fixes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Python side API changes

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Bug fix; only save seeds instead of full offset

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* Account for 3D grid size

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

---------

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
2023-07-27 16:11:34 -07:00
..
cutlass@c4f6b8c6bc FlashAttention-2 release 2023-07-17 06:21:34 -07:00
flash_attn Enable CUDA graphs (#386) 2023-07-27 16:11:34 -07:00
ft_attention [FT] Implement MQA/GQA 2023-07-22 23:47:01 -07:00
fused_dense_lib [FusedDense] Allocate lt_workspace on input device 2023-05-30 14:17:26 -07:00
fused_softmax Add Megatron attention implementation for benchmarking 2022-10-23 23:04:16 -07:00
layer_norm Fix random state for dropout_layer_norm (#315) 2023-07-23 15:05:13 -07:00
rotary Support H100 for other CUDA extensions 2023-03-15 16:59:27 -07:00
xentropy Support H100 for other CUDA extensions 2023-03-15 16:59:27 -07:00