flash-attention

History

Tri Dao 6738d9477d [LayerNorm] Implement RMS Norm		2023-01-06 17:34:22 -08:00
..
flash_attn	[Compilation] Change BOOL_SWITCH to fix Windows compilation	2023-01-06 14:40:58 -08:00
ft_attention	[Gen, FT] Use fp32 accum for FMA	2023-01-03 22:09:22 -08:00
fused_dense_lib	Implement TensorParallel for FusedDense and FusedDenseGeluDense	2022-12-24 11:48:56 -08:00
fused_softmax	Add Megatron attention implementation for benchmarking	2022-10-23 23:04:16 -07:00
layer_norm	[LayerNorm] Implement RMS Norm	2023-01-06 17:34:22 -08:00
rotary	Implement TensorParallel for MHA	2022-12-25 11:39:55 -08:00
xentropy	Add smoothing for CrossEntropyParallel, rename to CrossEntropyLoss	2022-12-23 14:51:08 -08:00