flash-attention

History

Phil Wang 5f1ae4a34b backwards for softcapping (#1033 ) * check in the two ways of approaching backwards for softcapping, both functional * prepare the softcap switch for backwards * temporary * cleanup to the way Tri prefers * calculate dtanh when copying from scores -> dtanh Tensor * no ternary operators allowed for constexpr, so just use some hack found online * fix maybe_dtanh, restore some files * restore another file * move calculate_dtanh to utils and colocate with apply_softcap * cleanup * maybe last cleanup * save for another pr * remove a stray line * fix spacing * fix an issue, and make test_flash_attn.py ready to test softcapping backwards		2024-07-21 23:25:46 -07:00
..
layers	Run isort and black on test files	2023-08-18 20:59:35 -07:00
losses	return z_loss (#768 )	2024-01-21 15:23:41 -08:00
models	Add test for BTLM init	2023-12-25 15:16:27 -08:00
modules	Run isort and black on test files	2023-08-18 20:59:35 -07:00
ops	[LayerNorm] Rename layernorm.py -> layer_norm.py	2024-01-05 00:21:03 -08:00
pyproject.toml	Move pyproject.toml to flash-attn and tests dir to avoid PEP 517	2023-08-25 15:05:28 -07:00
test_flash_attn.py	backwards for softcapping (#1033 )	2024-07-21 23:25:46 -07:00
test_rotary.py	Fix spurious re-compilations of `rotary_kernel` (#911 )	2024-04-05 13:40:41 -07:00