flash-attention/csrc
2022-11-25 16:30:18 -08:00
..
flash_attn Speed up compilation by splitting into separate .cu files 2022-11-25 16:30:18 -08:00
fused_dense_lib Mention that some CUDA extensions have only been tested on A100s 2022-11-15 07:10:25 -08:00
fused_softmax Add Megatron attention implementation for benchmarking 2022-10-23 23:04:16 -07:00
layer_norm [LayerNorm] Compile for both sm70 and sm80 2022-11-17 11:45:11 -08:00
rotary Implement rotary embedding in CUDA 2022-11-04 22:42:01 -07:00
xentropy Mention that some CUDA extensions have only been tested on A100s 2022-11-15 07:10:25 -08:00