flash-attention/csrc/ft_attention
2023-03-28 21:27:00 -07:00
..
cuda_bf16_fallbacks.cuh [Gen] Add kernel from FasterTransformer for benchmarking 2023-01-03 17:37:43 -08:00
cuda_bf16_wrapper.h [Gen] Add kernel from FasterTransformer for benchmarking 2023-01-03 17:37:43 -08:00
decoder_masked_multihead_attention_template.hpp [Gen, FT] Use fp32 accum for FMA 2023-01-03 22:09:22 -08:00
decoder_masked_multihead_attention_utils.h [FT] Fix FT's single query attention for bf16 hdim128 rotary 2023-03-28 21:27:00 -07:00
decoder_masked_multihead_attention.cu [Gen] Add kernel from FasterTransformer for benchmarking 2023-01-03 17:37:43 -08:00
decoder_masked_multihead_attention.h [Gen] Add kernel from FasterTransformer for benchmarking 2023-01-03 17:37:43 -08:00
ft_attention.cpp [Gen] Pass qkv_stride to ft_attention kernel for batched generation 2023-01-15 15:20:01 -08:00
README.md [Gen] Add kernel from FasterTransformer for benchmarking 2023-01-03 17:37:43 -08:00
setup.py Support H100 for other CUDA extensions 2023-03-15 16:59:27 -07:00

Attention kernel from FasterTransformer

This CUDA extension wraps the single-query attention kernel from FasterTransformer v5.2.1 for benchmarking purpose.

cd csrc/ft_attention && pip install .