flash-attention/hopper
jayhshah 5018ac6ac5
Fp8 kernel with "in-kernel" transpose of V in producer (#1100)
* base version

* restructure pipelines, add special fp8 epilogue

* add variants

* add fp8 causal and modify dynamic tile scheduler

* better causal schedule

* maintain two schedules for non causal and causal

* removing macros

* fix regression

* clean up unneeded methods and variants

* fix mistake with NumProducerThreads

* base version

* restructure pipelines, add special fp8 epilogue

* add variants

* add fp8 causal and modify dynamic tile scheduler

* better causal schedule

* maintain two schedules for non causal and causal

* removing macros

* fix regression

* clean up unneeded methods and variants

* fix mistake with NumProducerThreads

* use seqlen traits

* add fp8 .cu files and benchmark script

* fix merge issue

* fix merge issue

* fix merge issue

* remove duplicate code

* fix regression with varseqlen

* move varseqlen init in constexpr

* fix test script

* more constexpr on varseqlen and add max offset

* add back test cases
2024-07-30 14:14:14 -07:00
..
__init__.py FA3 initial code release 2024-07-11 09:53:36 -07:00
benchmark_attn.py Add var-seq-len to FA3 fp16 / bf16 fwd (#1072) 2024-07-22 21:32:41 -07:00
benchmark_flash_attention_fp8.py Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
block_info.h FA3 initial code release 2024-07-11 09:53:36 -07:00
epilogue_fwd_sm90_tma.hpp Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
flash_api.cpp Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
flash_attn_interface.py Revert "Changes For FP8 (#1075)" 2024-07-25 01:28:44 -07:00
flash_bwd_hdim64_fp16_sm90.cu FA3 initial code release 2024-07-11 09:53:36 -07:00
flash_bwd_hdim128_fp16_sm90.cu FA3 initial code release 2024-07-11 09:53:36 -07:00
flash_bwd_hdim256_fp16_sm90.cu FA3 initial code release 2024-07-11 09:53:36 -07:00
flash_bwd_kernel.h FA3 initial code release 2024-07-11 09:53:36 -07:00
flash_bwd_launch_template.h Remove torchlib dependency from cpp files (#1083) 2024-07-22 16:47:09 -07:00
flash_bwd_preprocess_kernel.h FA3 initial code release 2024-07-11 09:53:36 -07:00
flash_fwd_hdim64_bf16_sm90.cu [FA3] BF16 forward 2024-07-14 23:39:46 -07:00
flash_fwd_hdim64_e4m3_sm90.cu Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
flash_fwd_hdim64_fp16_sm90.cu FA3 initial code release 2024-07-11 09:53:36 -07:00
flash_fwd_hdim128_bf16_sm90.cu [FA3] BF16 forward 2024-07-14 23:39:46 -07:00
flash_fwd_hdim128_e4m3_sm90.cu Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
flash_fwd_hdim128_fp16_sm90.cu FA3 initial code release 2024-07-11 09:53:36 -07:00
flash_fwd_hdim256_bf16_sm90.cu [FA3] BF16 forward 2024-07-14 23:39:46 -07:00
flash_fwd_hdim256_e4m3_sm90.cu Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
flash_fwd_hdim256_fp16_sm90.cu FA3 initial code release 2024-07-11 09:53:36 -07:00
flash_fwd_kernel.h Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
flash_fwd_launch_template.h Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
flash.h Add var-seq-len to FA3 fp16 / bf16 fwd (#1072) 2024-07-22 21:32:41 -07:00
kernel_traits.h Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
mainloop_fwd_sm90_tma_gmma_ws.hpp Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
named_barrier.hpp Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
seq_len.h Add var-seq-len to FA3 fp16 / bf16 fwd (#1072) 2024-07-22 21:32:41 -07:00
setup.py Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
softmax.h Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
static_switch.h Add var-seq-len to FA3 fp16 / bf16 fwd (#1072) 2024-07-22 21:32:41 -07:00
test_flash_attn.py Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
tile_scheduler.hpp Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00
utils.h Fp8 kernel with "in-kernel" transpose of V in producer (#1100) 2024-07-30 14:14:14 -07:00