flash-attention

Author	SHA1	Message	Date
Tri Dao	abbc131173	[LayerNorm] Switch from CUDA to Triton implementation	2024-01-05 00:31:17 -08:00
Tri Dao	f1a73d0740	Run isort and black on python files	2023-08-18 14:22:11 -07:00
Tri Dao	75e334d407	[MLP] Add ParallelMLP	2023-07-22 23:45:51 -07:00
Tri Dao	b3177dfaf6	[GPT] Enable FlashAttention for GPT-J	2023-07-21 17:29:10 -07:00
Tri Dao	6fc1e07da2	[Block] Re-enable DropPath	2023-07-21 16:39:23 -07:00
Tri Dao	4f285b3547	FlashAttention-2 release	2023-07-17 06:21:34 -07:00
ljss	8e44c0eefb	Fix a bug	2023-06-02 13:46:19 +08:00
Federico Berto	3889ba168b	[BugFix] cannot unpack non-iterable NoneType object	2023-05-07 03:07:30 +09:00
Tri Dao	ba2fe7f378	[Gen] Move allocate_inference_cache to within the model	2023-04-20 18:15:12 -07:00
Tri Dao	96d10f6545	Implement LLaMa	2023-04-18 21:51:35 -07:00
Tri Dao	393882bc08	[LayerNorm] Implement LN with parallel residual, support dim 8k	2023-03-31 14:23:45 -07:00
Tri Dao	4d87e4d875	Implement GPT-J	2023-03-22 16:16:58 -07:00
Tri Dao	88173a1aaf	[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP	2023-01-17 18:12:27 -08:00
Tri Dao	780e8eeabb	[ViT] Support timm checkpoint, add tests	2023-01-16 01:20:34 -08:00
Tri Dao	ef085cfcda	[ViT] Fix extra norm_0, use new LN order in Block	2023-01-15 22:58:56 -08:00
Tri Dao	ff34123bd4	Reorder LN in Block, support OPT	2023-01-15 22:14:31 -08:00
Tri Dao	93383bd55b	[TP] Implement TensorParallel without sequence parallel	2023-01-07 13:45:22 -08:00
Tri Dao	a8cfe51551	Implement Tensor Parallel for transformer Block	2022-12-25 14:08:21 -08:00
Tri Dao	5fb6df0e04	Implement BERT	2022-12-18 21:47:27 -08:00
Tri Dao	d4b320b31f	Add MLP, MHA, Block, Embedding modules	2022-11-13 22:06:44 -08:00

20 Commits