flash-attention

History

Phil Wang 5f1ae4a34b backwards for softcapping (#1033 ) * check in the two ways of approaching backwards for softcapping, both functional * prepare the softcap switch for backwards * temporary * cleanup to the way Tri prefers * calculate dtanh when copying from scores -> dtanh Tensor * no ternary operators allowed for constexpr, so just use some hack found online * fix maybe_dtanh, restore some files * restore another file * move calculate_dtanh to utils and colocate with apply_softcap * cleanup * maybe last cleanup * save for another pr * remove a stray line * fix spacing * fix an issue, and make test_flash_attn.py ready to test softcapping backwards		2024-07-21 23:25:46 -07:00
..
cutlass@756c351b49	[FA3] BF16 forward	2024-07-14 23:39:46 -07:00
flash_attn	backwards for softcapping (#1033 )	2024-07-21 23:25:46 -07:00
ft_attention	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
fused_dense_lib	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
fused_softmax	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
layer_norm	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
rotary	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
xentropy	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00