flash-attention

History

rocking d8f104e97a Support AMD ROCm on FlashAttention 2 (#1010 ) * Support ck in fmha * Add ck submodule * Do not return lse if return_softmax == false * Use receipt to speed up ck compile time * Integrate new version of ck_tile * Support dropout for mha_fwd() * Add dropout to mha_varlen_fwd() * Update ck to develop * Extract padding function for dropout randval * Extract randval transformation function * Sync the code structure and coding style with FA * Remove this line, c++ api will handle this. Sync with test_flash_attn.py * fix compile error * Add mha_bwd * Generate dropout seed and offset from user generator * update CK * Add mha_varlen_bwd * Use same python as build flash-attn to generate ck kernel * Fix bug of group mode fwd about returning softmax lse * larger the test tollerance * Add test_flash_attn_output() and test_flash_attn_varlen_output() * Always fill softmax_lse * Remove duplicate benchmark script, since we already implement mha_bwd * Refine get value from tuple * Use default parameter for stream_config * unblock all platform * Add comment * refine the test code * Refine naming * Add unpack to namespace * Do not hardcode the warp size 64 * Add more targets * Add README * Optimize mha_fwd if seqlen_q == 1 * Support get_wheel_url for rocm * Detect rocm environment by pytorch's IS_HIP_EXTENSION * update to lastest ck * Add necessary compile flag * Sync the api with upstream FA --------- Co-authored-by: carlushuang <carlus.huang@amd.com> Co-authored-by: Yichen Yan <wenji.yyc@alibaba-inc.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com> Co-authored-by: Yichen Yan <oraluben@outlook.com>		2024-07-22 21:34:37 -07:00
..
layers	Run isort and black on test files	2023-08-18 20:59:35 -07:00
losses	return z_loss (#768 )	2024-01-21 15:23:41 -08:00
models	Add test for BTLM init	2023-12-25 15:16:27 -08:00
modules	Run isort and black on test files	2023-08-18 20:59:35 -07:00
ops	[LayerNorm] Rename layernorm.py -> layer_norm.py	2024-01-05 00:21:03 -08:00
pyproject.toml	Move pyproject.toml to flash-attn and tests dir to avoid PEP 517	2023-08-25 15:05:28 -07:00
test_flash_attn_ck.py	Support AMD ROCm on FlashAttention 2 (#1010 )	2024-07-22 21:34:37 -07:00
test_flash_attn.py	backwards for softcapping (#1033 )	2024-07-21 23:25:46 -07:00
test_rotary.py	Fix spurious re-compilations of `rotary_kernel` (#911 )	2024-04-05 13:40:41 -07:00
test_util.py	Add var-seq-len to FA3 fp16 / bf16 fwd (#1072 )	2024-07-22 21:32:41 -07:00