flash-attention

History

Tri Dao a8cfe51551 Implement Tensor Parallel for transformer Block		2022-12-25 14:08:21 -08:00
..
flash_attn	Simplify BOOL_SWITCH macro to fix compiling error on gcc 7	2022-12-06 14:38:32 -08:00
fused_dense_lib	Implement TensorParallel for FusedDense and FusedDenseGeluDense	2022-12-24 11:48:56 -08:00
fused_softmax	Add Megatron attention implementation for benchmarking	2022-10-23 23:04:16 -07:00
layer_norm	Implement Tensor Parallel for transformer Block	2022-12-25 14:08:21 -08:00
rotary	Implement TensorParallel for MHA	2022-12-25 11:39:55 -08:00
xentropy	Add smoothing for CrossEntropyParallel, rename to CrossEntropyLoss	2022-12-23 14:51:08 -08:00