flash-attention/csrc
Antoni Viros 83e41b3ca4
Add custom ops for compatibility with PT Compile (#1139)
* Add custom ops for compatibility with PT Compile

* Add support for varlen functions too

* Add version checks for pytorch API

* Fix PT compile interfaces so it works e2e

* Make sure PT < 2.4 runs fine

* Fix python mistake

* Fix all the autograd magic issues

* typo on head_dim

* Fix deterministic test failures, remove unneeded detaches()

* remove test requires_grad

* Resolve all the pytorch versioning issues

* C++ and python refactor to improve padding management for torch.compile()

* Add improvements suggested by @anijain2305
2024-09-17 19:49:26 -07:00
..
composable_kernel@a9b170b541 Support page kvcache in AMD ROCm (#1198) 2024-09-15 23:17:28 -07:00
cutlass@756c351b49 [FA3] BF16 forward 2024-07-14 23:39:46 -07:00
flash_attn Add custom ops for compatibility with PT Compile (#1139) 2024-09-17 19:49:26 -07:00
flash_attn_ck Support page kvcache in AMD ROCm (#1198) 2024-09-15 23:17:28 -07:00
ft_attention Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
fused_dense_lib Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
fused_softmax Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
layer_norm Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
rotary Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
xentropy Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00