flash-attention/tests
Phil Wang 5f1ae4a34b
backwards for softcapping (#1033)
* check in the two ways of approaching backwards for softcapping, both functional

* prepare the softcap switch for backwards

* temporary

* cleanup to the way Tri prefers

* calculate dtanh when copying from scores -> dtanh Tensor

* no ternary operators allowed for constexpr, so just use some hack found online

* fix maybe_dtanh, restore some files

* restore another file

* move calculate_dtanh to utils and colocate with apply_softcap

* cleanup

* maybe last cleanup

* save for another pr

* remove a stray line

* fix spacing

* fix an issue, and make test_flash_attn.py ready to test softcapping backwards
2024-07-21 23:25:46 -07:00
..
layers Run isort and black on test files 2023-08-18 20:59:35 -07:00
losses return z_loss (#768) 2024-01-21 15:23:41 -08:00
models Add test for BTLM init 2023-12-25 15:16:27 -08:00
modules Run isort and black on test files 2023-08-18 20:59:35 -07:00
ops [LayerNorm] Rename layernorm.py -> layer_norm.py 2024-01-05 00:21:03 -08:00
pyproject.toml Move pyproject.toml to flash-attn and tests dir to avoid PEP 517 2023-08-25 15:05:28 -07:00
test_flash_attn.py backwards for softcapping (#1033) 2024-07-21 23:25:46 -07:00
test_rotary.py Fix spurious re-compilations of rotary_kernel (#911) 2024-04-05 13:40:41 -07:00