flash-attention/csrc/layer_norm
2022-11-17 11:45:11 -08:00
..
ln_api.cpp Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-13 21:59:20 -08:00
ln_bwd_kernels.cuh Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-13 21:59:20 -08:00
ln_bwd_semi_cuda_kernel.cu [LayerNorm] Check cuda error after querying ctas_per_sm 2022-11-15 07:05:13 -08:00
ln_fwd_cuda_kernel.cu [LayerNorm] Check cuda error after querying ctas_per_sm 2022-11-15 07:05:13 -08:00
ln_fwd_kernels.cuh Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-13 21:59:20 -08:00
ln_kernel_traits.h Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-13 21:59:20 -08:00
ln_utils.cuh Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-13 21:59:20 -08:00
ln.h Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-13 21:59:20 -08:00
README.md Mention that some CUDA extensions have only been tested on A100s 2022-11-15 07:10:25 -08:00
setup.py [LayerNorm] Compile for both sm70 and sm80 2022-11-17 11:45:11 -08:00
static_switch.h Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-13 21:59:20 -08:00

This CUDA extension implements fused dropout + residual + LayerNorm, based on Apex's FastLayerNorm. We add dropout and residual, and make it work for both pre-norm and post-norm architecture.

It has only been tested on A100s.

cd csrc/layer_norm && pip install .