flash-attention/csrc/flash_attn/src
2023-11-19 22:20:01 -08:00
..
block_info.h Swap seqlen_q and nheads for MQA to speed it up (h/t Daniel Haziza) 2023-09-10 22:56:33 -07:00
flash_bwd_hdim32_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim32_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim64_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim64_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim96_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim96_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim128_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim128_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim160_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim160_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim192_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim192_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim224_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim224_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim256_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_hdim256_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_bwd_kernel.h Write zero to out / grad if seqlen_q or seqlen_k is zero 2023-11-19 22:20:01 -08:00
flash_bwd_launch_template.h add checks (#640) 2023-11-19 20:43:27 -08:00
flash_fwd_hdim32_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim32_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim64_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim64_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim96_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim96_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim128_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim128_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim160_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim160_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim192_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim192_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim224_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim224_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim256_bf16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_hdim256_fp16_sm80.cu Use generate_kernels.py script from Driss Guessous 2023-08-28 13:34:12 -07:00
flash_fwd_kernel.h Write zero to out / grad if seqlen_q or seqlen_k is zero 2023-11-19 22:20:01 -08:00
flash_fwd_launch_template.h add checks (#640) 2023-11-19 20:43:27 -08:00
flash_fwd_split_hdim32_bf16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim32_fp16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim64_bf16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim64_fp16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim96_bf16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim96_fp16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim128_bf16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim128_fp16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim160_bf16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim160_fp16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim192_bf16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim192_fp16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim224_bf16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim224_fp16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim256_bf16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash_fwd_split_hdim256_fp16_sm80.cu Implement splitKV attention 2023-08-29 00:58:29 -07:00
flash.h [Gen] Accept cache_batch_idx to index into the KV cache 2023-10-03 16:27:26 -07:00
generate_kernels.py Implement splitKV attention 2023-08-29 00:58:29 -07:00
kernel_traits_sm90.h FlashAttention-2 release 2023-07-17 06:21:34 -07:00
kernel_traits.h Implement rotary embedding in flash_attn_with_kvcache 2023-09-16 01:20:16 -07:00
philox.cuh FlashAttention-2 release 2023-07-17 06:21:34 -07:00
softmax.h Implement local attention 2023-09-26 16:31:08 -07:00
static_switch.h Fix compile error on MSVC 2023-07-19 08:04:57 +00:00
utils.h Implement rotary embedding in flash_attn_with_kvcache 2023-09-16 01:20:16 -07:00