flash-attention/csrc/layer_norm
2022-12-25 14:08:21 -08:00
..
ln_api.cpp Implement Tensor Parallel for transformer Block 2022-12-25 14:08:21 -08:00
ln_bwd_256.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_512.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_768.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_1024.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_1280.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_1536.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_2048.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_2560.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_3072.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_4096.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_5120.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_6144.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_bwd_kernels.cuh [LayerNorm] Support taking subset of input or subset of output 2022-12-12 22:16:14 -08:00
ln_fwd_256.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_512.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_768.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_1024.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_1280.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_1536.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_2048.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_2560.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_3072.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_4096.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_5120.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_6144.cu [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
ln_fwd_kernels.cuh [LayerNorm] Support taking subset of input or subset of output 2022-12-12 22:16:14 -08:00
ln_kernel_traits.h [LayerNorm] Fuse LayerScale 2022-12-10 23:28:23 -08:00
ln_utils.cuh [LayerNorm] Fuse LayerScale 2022-12-10 23:28:23 -08:00
ln.h [LayerNorm] Support taking subset of input or subset of output 2022-12-12 22:16:14 -08:00
README.md [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
setup.py [LayerNorm] Support all dimensions up to 6k (if divisible by 8) 2022-12-09 02:06:22 -08:00
static_switch.h Add fused_dense and dropout_add_layernorm CUDA extensions 2022-11-13 21:59:20 -08:00

This CUDA extension implements fused dropout + residual + LayerNorm, building on Apex's FastLayerNorm. We add dropout and residual, and make it work for both pre-norm and post-norm architecture. We also make it work for more hidden dimensions (all dimensions divisible by 8, up to 6144).

If you want to use it for dimensions larger than 6k, please file an issue.

This extension has only been tested on A100s.

cd csrc/layer_norm && pip install .