flash-attention

Author	SHA1	Message	Date
Tri Dao	993d12448e	Implement GPT-NeoX	2023-03-29 01:21:25 -07:00
Tri Dao	f5d0fbd468	[FT] Fix FT's single query attention for bf16 hdim128 rotary	2023-03-28 21:27:00 -07:00
Tri Dao	4d87e4d875	Implement GPT-J	2023-03-22 16:16:58 -07:00
Tri Dao	5d079fdd7a	[Triton] Fix benchmark_causal, mention Triton version	2023-03-22 00:51:16 -07:00
Tri Dao	dc08ea1c33	Support H100 for other CUDA extensions	2023-03-15 16:59:27 -07:00
Vik Paruchuri	3165398074	Remove unused kwargs in flashattention	2023-03-15 10:36:19 -07:00
Tri Dao	e45a46a5b7	[Rotary] Implement GPT-J style (interleaved) rotary	2023-03-14 14:35:53 -07:00
Tri Dao	78b7a1dc18	[OPT] Load fp16 weights on CPU before moving to GPU	2023-01-22 17:01:32 -08:00
Tri Dao	eb33e587e9	[LayerNorm] Rename x1 -> residual	2023-01-19 13:07:27 -08:00
Tri Dao	f68d41ec77	[Gen] Add OPT to generation test	2023-01-17 19:59:06 -08:00
Tri Dao	88173a1aaf	[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP	2023-01-17 18:12:27 -08:00
Tri Dao	780e8eeabb	[ViT] Support timm checkpoint, add tests	2023-01-16 01:20:34 -08:00
Tri Dao	2ec7d3f72c	Merge pull request #105 from jamaliki/patch-1 Change default dropout value in documentation	2023-01-15 23:01:20 -08:00
Tri Dao	ef085cfcda	[ViT] Fix extra norm_0, use new LN order in Block	2023-01-15 22:58:56 -08:00
Tri Dao	ff34123bd4	Reorder LN in Block, support OPT	2023-01-15 22:14:31 -08:00
Tri Dao	7c2191542a	[Gen] Make generation work with Tensor Parallel	2023-01-15 11:34:27 -08:00
Kiarash Jamali	41cb909741	Change default dropout value in documentation Documentation says default is 0.1, but the code has attention_dropout default at 0.0	2023-01-13 10:50:07 +00:00
Tri Dao	f95c2fc108	[Gen] Remove commented code	2023-01-07 19:06:39 -08:00
Tri Dao	b48599002a	[Gen] Add timing option	2023-01-07 19:05:09 -08:00
Tri Dao	0938298e4c	[Gen] Adjust shape of kv_cache when using FT	2023-01-07 17:27:54 -08:00
Tri Dao	e02fd588aa	[Gen] Implement top-k and top-p sampling	2023-01-07 17:00:02 -08:00
Tri Dao	11be742aa3	[Gen] Test generation with rotary embedding	2023-01-07 14:37:54 -08:00
Tri Dao	8d9674ed08	Merge pull request #102 from Lamikins/main fixed cross attention typeerror	2023-01-07 13:56:20 -08:00
Tri Dao	93383bd55b	[TP] Implement TensorParallel without sequence parallel	2023-01-07 13:45:22 -08:00
Darius Lam	aec35fd67c	fixed cross attention typeerror	2023-01-07 12:58:41 -08:00
Tri Dao	6738d9477d	[LayerNorm] Implement RMS Norm	2023-01-06 17:34:22 -08:00
Tri Dao	a668890fcd	[Gen] Add option to run generation with FT attention kernel	2023-01-03 22:10:31 -08:00
Tri Dao	4cab4de5ea	[TP] Put parallel embeddings in separate modules	2023-01-02 08:47:48 -08:00
Tri Dao	1ec09ebd90	[FusedDense] Limit matrix dims to 2M (instead of 64k)	2023-01-01 17:06:39 -08:00
Tri Dao	714c1b4f0f	[Bert] Fix embedding layer norm before embedding dropout	2023-01-01 10:38:05 -08:00
Tri Dao	ef1ba918c6	[GPT] Refactor function to shard state_dict for TensorParallel	2023-01-01 00:09:33 -08:00
Tri Dao	65b4064b2a	[FusedDense] Kick off input all_gather before weight dtype conversion	2022-12-31 22:47:34 -08:00
Tri Dao	85b8e3d334	[Docs] Mention that XPos's scale_base is recommended to be 512	2022-12-29 20:25:02 -08:00
Tri Dao	a6ec1782dc	Bump to v0.2.6	2022-12-27 22:05:20 -08:00
Tri Dao	63670fd84a	Implement generation for GPT	2022-12-27 21:01:50 -08:00
Tri Dao	9d797d8848	Support loading GPT2 weights from Huggingface	2022-12-27 11:22:48 -08:00
Tri Dao	c6ecd40a59	Tweak CrossEntropyLoss to take process_group in init	2022-12-27 10:47:43 -08:00
Tri Dao	b4018a5028	Implement Tensor Parallel for GPT model	2022-12-26 16:22:43 -08:00
Tri Dao	78225c5366	Implement Tensor Parallel for GPT2Embeddings	2022-12-25 14:29:53 -08:00
Tri Dao	a8cfe51551	Implement Tensor Parallel for transformer Block	2022-12-25 14:08:21 -08:00
Tri Dao	1e712ea8b0	Implement TensorParallel for MHA	2022-12-25 11:39:55 -08:00
Tri Dao	226a1b721d	Implement TensorParallel for FusedDense and FusedDenseGeluDense	2022-12-24 11:48:56 -08:00
Tri Dao	dff68c2b22	Add smoothing for CrossEntropyParallel, rename to CrossEntropyLoss	2022-12-23 14:51:08 -08:00
Tri Dao	e68ebbe89a	Simplify FusedDense	2022-12-22 21:25:31 -08:00
Tri Dao	496e4f528c	Implement XPos (Sun et al.)	2022-12-21 14:17:58 -08:00
Tri Dao	13cdceb377	Implement last_layer_subset optimization for BERT	2022-12-19 22:18:46 -08:00
Tri Dao	5fb6df0e04	Implement BERT	2022-12-18 21:47:27 -08:00
Alexander Ploshkin	ee8984d2be	add asserts for sin shape	2022-12-17 13:34:57 +04:00
Alexander Ploshkin	c7c66976cc	fix slicing dimensions	2022-12-16 15:39:06 +04:00
Alexander Ploshkin	96656b9323	Remove redundant shape asserts in rotary embeddings	2022-12-15 18:13:21 +04:00

1 2 3

107 Commits