flash-attention

Author	SHA1	Message	Date
Chirag Jain	50896ec574	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
Tri Dao	abbc131173	[LayerNorm] Switch from CUDA to Triton implementation	2024-01-05 00:31:17 -08:00
Joel Lamy-Poirier	767b71ccf0	Fix random state for dropout_layer_norm (#315 )	2023-07-23 15:05:13 -07:00
Ikko Eltociear Ashimine	dfc60f6b7d	[LayerNorm] Fix typo in ln_api.cpp unintialized -> uninitialized	2023-07-20 01:16:16 +09:00
Tri Dao	393882bc08	[LayerNorm] Implement LN with parallel residual, support dim 8k	2023-03-31 14:23:45 -07:00
Tri Dao	dc08ea1c33	Support H100 for other CUDA extensions	2023-03-15 16:59:27 -07:00
Tri Dao	eb33e587e9	[LayerNorm] Rename x1 -> residual	2023-01-19 13:07:27 -08:00
Tri Dao	6738d9477d	[LayerNorm] Implement RMS Norm	2023-01-06 17:34:22 -08:00
Tri Dao	a8cfe51551	Implement Tensor Parallel for transformer Block	2022-12-25 14:08:21 -08:00
Tri Dao	5db330519a	[LayerNorm] Support taking subset of input or subset of output	2022-12-12 22:16:14 -08:00
Tri Dao	ae137ed17a	[LayerNorm] Fuse LayerScale	2022-12-10 23:28:23 -08:00
Tri Dao	8c6609ae1a	[LayerNorm] Support all dimensions up to 6k (if divisible by 8)	2022-12-09 02:06:22 -08:00
Tri Dao	0bf5e50038	Release training code	2022-11-28 17:34:40 -08:00
Tri Dao	39ed597b28	[LayerNorm] Compile for both sm70 and sm80	2022-11-17 11:45:11 -08:00
Tri Dao	43ab0b5205	Mention that some CUDA extensions have only been tested on A100s	2022-11-15 07:10:25 -08:00
Tri Dao	e4d3013e15	[LayerNorm] Check cuda error after querying ctas_per_sm	2022-11-15 07:05:13 -08:00
Tri Dao	2e33fc8e36	Add GPT and ViT models	2022-11-13 22:30:23 -08:00
Tri Dao	fa6d1ce44f	Add fused_dense and dropout_add_layernorm CUDA extensions	2022-11-13 21:59:20 -08:00

18 Commits