flash-attention/csrc
rocking e2182cc21d
Support page kvcache in AMD ROCm (#1198)
* Integrate ck branch of ck_tile/fa_bwd_opt

* Assume dq and q share the same stride

* update ck

* Integrate more stride of dq_acc

* Revert fwd dropout

* Fix paremeter order

* Integrate ck with more stride

* update the limit of hdim of bwd

* Check argument

* Add test_flash_attn_causal

* Support unpad lse

* Add  test_flash_attn_varlen_causal, test_flash_attn_race_condition, test_flash_attn_bwd_overflow, test_flash_attn_bwd_transpose, test_flash_attn_bwd_varlen_overflow, test_flash_attn_deterministic, test_flash_attn_varlen_deterministic

* Fix stride and Kn0

* Fix CK sync issue

* Fix typo

* Update CK for changing of fmha_fwd_args

* Add kvcache tmp

* Add kvcache

* Fix comment

* Sync behavior with ck

* Update CK to develop

* remove large test case

* Add kvcache test

* Fix page_block_size in arg

* Minor fix

* Fix stride error

* Update seqlen of kvcache before splitkv

* Fix compile error

* Fix bug of hdim is not 8x

* Fit ck arg

* support adaptive num_splits

* add more tests

* Refine test tolerance

* update CK

* Move override_num_splits_if_necessary into cpp

* update ck

* Update ck

* Support different flag for different version of hip

* remove coerce-illegal, becasue this is not required in FA

* Update ck to fix xcratch memory

* Add coerce-illegal in some version

* Add compile flag for rtn rounding

* remove redundant init

* Using env var to switch rounding mode

* update ck
2024-09-15 23:17:28 -07:00
..
composable_kernel@a9b170b541 Support page kvcache in AMD ROCm (#1198) 2024-09-15 23:17:28 -07:00
cutlass@756c351b49 [FA3] BF16 forward 2024-07-14 23:39:46 -07:00
flash_attn Split bwd into more .cu files to speed up compilation 2024-07-23 01:32:09 -07:00
flash_attn_ck Support page kvcache in AMD ROCm (#1198) 2024-09-15 23:17:28 -07:00
ft_attention Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
fused_dense_lib Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
fused_softmax Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
layer_norm Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
rotary Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
xentropy Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00