flash-attention

History

Tri Dao 5db330519a [LayerNorm] Support taking subset of input or subset of output		2022-12-12 22:16:14 -08:00
..
flash_attn	Simplify BOOL_SWITCH macro to fix compiling error on gcc 7	2022-12-06 14:38:32 -08:00
fused_dense_lib	Mention that some CUDA extensions have only been tested on A100s	2022-11-15 07:10:25 -08:00
fused_softmax	Add Megatron attention implementation for benchmarking	2022-10-23 23:04:16 -07:00
layer_norm	[LayerNorm] Support taking subset of input or subset of output	2022-12-12 22:16:14 -08:00
rotary	Implement rotary embedding in CUDA	2022-11-04 22:42:01 -07:00
xentropy	Mention that some CUDA extensions have only been tested on A100s	2022-11-15 07:10:25 -08:00