flash-attention

Author	SHA1	Message	Date
Tri Dao	f5d0fbd468	[FT] Fix FT's single query attention for bf16 hdim128 rotary	2023-03-28 21:27:00 -07:00
Tri Dao	4d87e4d875	Implement GPT-J	2023-03-22 16:16:58 -07:00
Tri Dao	780e8eeabb	[ViT] Support timm checkpoint, add tests	2023-01-16 01:20:34 -08:00
Tri Dao	7c2191542a	[Gen] Make generation work with Tensor Parallel	2023-01-15 11:34:27 -08:00
Tri Dao	0938298e4c	[Gen] Adjust shape of kv_cache when using FT	2023-01-07 17:27:54 -08:00
Tri Dao	11be742aa3	[Gen] Test generation with rotary embedding	2023-01-07 14:37:54 -08:00
Tri Dao	8d9674ed08	Merge pull request #102 from Lamikins/main fixed cross attention typeerror	2023-01-07 13:56:20 -08:00
Tri Dao	93383bd55b	[TP] Implement TensorParallel without sequence parallel	2023-01-07 13:45:22 -08:00
Darius Lam	aec35fd67c	fixed cross attention typeerror	2023-01-07 12:58:41 -08:00
Tri Dao	a668890fcd	[Gen] Add option to run generation with FT attention kernel	2023-01-03 22:10:31 -08:00
Tri Dao	65b4064b2a	[FusedDense] Kick off input all_gather before weight dtype conversion	2022-12-31 22:47:34 -08:00
Tri Dao	a6ec1782dc	Bump to v0.2.6	2022-12-27 22:05:20 -08:00
Tri Dao	63670fd84a	Implement generation for GPT	2022-12-27 21:01:50 -08:00
Tri Dao	1e712ea8b0	Implement TensorParallel for MHA	2022-12-25 11:39:55 -08:00
Tri Dao	e68ebbe89a	Simplify FusedDense	2022-12-22 21:25:31 -08:00
Tri Dao	496e4f528c	Implement XPos (Sun et al.)	2022-12-21 14:17:58 -08:00
Tri Dao	13cdceb377	Implement last_layer_subset optimization for BERT	2022-12-19 22:18:46 -08:00
Tri Dao	5fb6df0e04	Implement BERT	2022-12-18 21:47:27 -08:00
Tri Dao	d4b320b31f	Add MLP, MHA, Block, Embedding modules	2022-11-13 22:06:44 -08:00

19 Commits