flash-attention

Author	SHA1	Message	Date
Tri Dao	cf4f0a39f3	Merge pull request #241 from ksivaman/fix_compilation_time Fix compilation time	2023-05-25 18:34:41 -04:00
Kirthi Shankar Sivamani	6d45d0bd6c	Re-add ninja Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>	2023-05-25 21:22:50 +00:00
Kirthi Shankar Sivamani	852bc40b8c	Remove torch from pyproject.toml Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>	2023-05-25 19:12:22 +00:00
Kirthi Shankar Sivamani	c1d117c2d0	Remove ninja from pyproject.toml Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>	2023-05-25 19:12:00 +00:00
Tri Dao	f0c40b7ddb	Recommend Nvidia's Pytorch container	2023-05-19 09:41:14 -07:00
Tri Dao	3cad2ab35d	Merge pull request #229 from maxhgerlach/local-version Allow adding an optional local version to the package version	2023-05-19 11:43:24 -04:00
Max H. Gerlach	31f78a9814	Allow adding an optional local version to the package version	2023-05-19 17:27:41 +02:00
Tri Dao	40a25c8ee7	Update roadmap	2023-05-17 08:32:26 -07:00
Tri Dao	eff9fe6b80	Add ninja to pyproject.toml build-system, bump to v1.0.5	2023-05-12 14:20:31 -07:00
Tri Dao	36d0a19f1e	Merge pull request #193 from anthonyhu/pyproject-build Use pyproject.toml to specify build dependencies	2023-05-11 21:26:28 -04:00
Tri Dao	5bf7f57d47	Merge pull request #202 from fedebotu/main [BugFix] avoid bug on ImportError	2023-05-06 14:15:02 -04:00
Federico Berto	69f5f7d0a2	[BugFix] cannot unpack non-iterable NoneType object	2023-05-07 03:07:44 +09:00
Federico Berto	3889ba168b	[BugFix] cannot unpack non-iterable NoneType object	2023-05-07 03:07:30 +09:00
Tri Dao	a9a4b4e4f2	[LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm	2023-05-04 23:39:43 -07:00
Anthony Hu	d63cfc3551	Use pyproject.toml to specify build dependencies	2023-04-27 11:51:52 +01:00
Tri Dao	ad113948a6	[Docs] Clearer error message for bwd d > 64, bump to v1.0.4	2023-04-26 09:19:48 -07:00
Tri Dao	fbbb107848	Bump version to v1.0.3.post0	2023-04-21 13:37:23 -07:00
Tri Dao	67ef5d28df	Bump version to 1.0.3	2023-04-21 12:04:53 -07:00
Tri Dao	fcab93b43a	[Gen] Minor tweak to allocate_inference_cache	2023-04-21 11:56:47 -07:00
Tri Dao	ba2fe7f378	[Gen] Move allocate_inference_cache to within the model	2023-04-20 18:15:12 -07:00
Tri Dao	3da42d24b1	[GPT] Add option to only return the logit for the last token	2023-04-20 17:21:08 -07:00
Tri Dao	311d6606bf	[Gen] Fix FT kernel smem size, CG when batch size changed	2023-04-20 17:03:13 -07:00
Tri Dao	96d10f6545	Implement LLaMa	2023-04-18 21:51:35 -07:00
Tri Dao	b630aef53f	Implement GatedMlp	2023-04-18 03:37:14 -07:00
Tri Dao	ac3b684cdb	Have a separate nn.Dropout module in SelfAttention module	2023-04-17 22:34:05 -07:00
Tri Dao	df1344f866	Bump to v1.0.2	2023-04-15 22:19:31 -07:00
Tri Dao	635f159ee3	Merge pull request #166 from ksivaman/enable_cuda_graph_capture Enable CUDA graph capture	2023-04-16 00:27:33 -04:00
Kirthi Shankar Sivamani	45567a25a2	only 1 thread writes to global mem in fprop Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>	2023-04-15 06:09:41 +00:00
Kirthi Shankar Sivamani	a0997bc77c	Merge branch 'HazyResearch:main' into enable_cuda_graph_capture	2023-04-14 21:45:37 -07:00
Tri Dao	221a39fd3a	[Docs] Link to Forbes article	2023-04-14 21:20:38 -07:00
Tri Dao	605655bc66	[Gen] Fix FT kernel when using CG	2023-04-14 16:50:01 -07:00
Tri Dao	dceb2687c5	Merge pull request #170 from CrustaceanJ/dependencies Missing module in `setup.py`	2023-04-14 15:41:46 -04:00
Pavel Shvets	72629ac9ba	add missed module	2023-04-14 20:08:24 +03:00
Kirthi Shankar Sivamani	081c2b012a	Merge branch 'HazyResearch:main' into enable_cuda_graph_capture	2023-04-13 19:36:45 -07:00
Tri Dao	1c9ef9b399	[Gen] Measure prompt processing + decoding time, not just decoding	2023-04-13 15:39:56 -07:00
Tri Dao	6f6e9a9aaf	[FusedDense] Enable sqrelu activation in FusedMLP	2023-04-13 15:29:32 -07:00
Kirthi Shankar Sivamani	7d25a4ec4f	Handle FlashAttnQKVPackedSplitFunc by making rng_state optional in backward Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>	2023-04-13 06:25:52 +00:00
Kirthi Shankar Sivamani	315fd31f0c	Merge branch 'HazyResearch:main' into enable_cuda_graph_capture	2023-04-12 22:42:24 -07:00
Tri Dao	5cee071431	Merge pull request #164 from ZhiyuanChen/patch-1 make mlp hidden_features defaults to 4*in_features	2023-04-12 23:21:12 -04:00
Zhiyuan Chen	8c42415664	make mlp hidden_features defaults to 4*in_features	2023-04-13 11:08:21 +08:00
Kirthi Shankar Sivamani	31018c5fa0	Support CUDA graph capture Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>	2023-04-12 16:53:22 -07:00
Tri Dao	853ff72963	Bump version to v1.0.1, fix Cutlass version	2023-04-12 10:05:01 -07:00
Tri Dao	74af023316	Bump version to 1.0.0	2023-04-11 23:32:35 -07:00
Tri Dao	dec4f2e910	[FusedDense] Set workspace size to 32M for Hopper and 4M for others	2023-04-06 23:40:15 -07:00
Tri Dao	d478eeec8f	Merge pull request #154 from kuizhiqing/usage add paddlepaddle in usage	2023-04-04 02:54:37 -04:00
kuizhiqing	c5be8d3aab	add paddlepaddle in usage	2023-04-04 14:15:51 +08:00
Tri Dao	d6fc860573	Merge pull request #147 from ksivaman/add_deterministic_execution_option Add option for deterministic execution	2023-03-31 17:32:50 -04:00
Tri Dao	393882bc08	[LayerNorm] Implement LN with parallel residual, support dim 8k	2023-03-31 14:23:45 -07:00
Kirthi Shankar Sivamani	b6aa059bbf	Add option for deterministic execution	2023-03-30 18:23:35 -07:00
Tri Dao	009a3e71ec	[Training] Fix lightning _PATH import	2023-03-29 01:43:39 -07:00

1 2 3 4 5 ...

285 Commits