* Add RNG state to kernel launch params Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Save seed and offset for backward Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Single thread write to global mem Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compute_dq_dk_dv_1colblock get seed and offset from launch params Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compute_dq_dk_dv_1rowblock get seed and offset from launch params Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change forward c++ APIs to save RNG state for backward Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change backward c++ APIs to set RNG state for bprop launcher Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fixes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python side API changes Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fix; only save seeds instead of full offset Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Account for 3D grid size Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> |
||
|---|---|---|
| .. | ||
| cutlass@c4f6b8c6bc | ||
| flash_attn | ||
| ft_attention | ||
| fused_dense_lib | ||
| fused_softmax | ||
| layer_norm | ||
| rotary | ||
| xentropy | ||