flash-attention

Author	SHA1	Message	Date
Tri Dao	2aea958f89	[CI] Compile with torch 2.3.0.dev20240207	2024-04-07 20:11:52 -07:00
Arvind Sundararajan	26c9e82743	Support ARM builds (#757 )	2024-03-13 21:57:20 -07:00
Chirag Jain	50896ec574	Make nvcc threads configurable via environment variable (#885 )	2024-03-13 20:46:57 -07:00
Qubitium	f45bbb4c94	Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832 )	2024-02-17 18:17:15 -08:00
Tri Dao	d4a7c8ffbb	[CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly	2023-11-27 16:21:28 -08:00
Tri Dao	5e525a8dc8	[CI] Use official Pytorch 2.1, add CUDA 11.8 for Pytorch 2.1	2023-10-03 22:20:30 -07:00
Tri Dao	1879e089c7	Reduce number of templates for headdim > 128	2023-09-23 22:24:30 -07:00
Tri Dao	bff3147175	Re-enable compilation for Hopper	2023-09-21 23:55:25 -07:00
Tri Dao	dfe29f5e2b	[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	2023-09-18 15:29:06 -07:00
Federico Berto	fa3ddcbaaa	[Minor] add nvcc note on bare_metal_version `RuntimeError` (#552 ) * Add nvcc note on bare_metal_version `RuntimeError` * Run Black formatting	2023-09-18 11:48:15 -07:00
Tri Dao	799f56fa90	Don't compile for Pytorch 2.1 on CUDA 12.1 due to nvcc segfaults	2023-09-17 22:15:38 -07:00
Tri Dao	bb9beb3645	Remove some unused headers	2023-09-12 12:37:10 -07:00
Tri Dao	0c04943fa2	Require CUDA 11.6+, clean up setup.py	2023-09-03 21:24:56 -07:00
Tri Dao	b1fbbd8337	Implement splitKV attention	2023-08-29 00:58:29 -07:00
Tri Dao	cbb4cf5f46	Don't need to set TORCH_CUDA_ARCH_LIST in setup.py	2023-08-18 14:18:54 -07:00
Aman Gupta Karmani	aab603af4f	fix binary wheel installation when nvcc is not available (#448 )	2023-08-14 14:54:26 -07:00
Tri Dao	9c531bdc0a	Use single thread compilation for cuda12.1, torch2.1 to avoid OOM CI	2023-08-14 10:03:31 -07:00
Tri Dao	2ddeaa406c	Fix wheel building	2023-08-13 16:48:47 -07:00
Tri Dao	3c458cff77	Merge branch 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention into piercefreeman-feature/demo-wheels * 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention: (25 commits) Install standard non-wheel package Remove release creation Build wheel on each push Isolate 2.0.0 & cuda12 Clean setup.py imports Remove builder project Bump version Add notes to github action workflow Add torch dependency to final build Exclude cuda erroring builds Exclude additional disallowed matrix params Full version matrix Add CUDA 11.7 Release is actually unsupported echo OS version Temp disable deploy OS version build numbers Restore full build matrix Refactor and clean of setup.py Strip cuda name from torch version ...	2023-08-13 16:03:51 -07:00
Tri Dao	1c41d2b0e5	Fix race condition in bwd (overwriting sK)	2023-08-01 09:00:10 -07:00
Tri Dao	4f285b3547	FlashAttention-2 release	2023-07-17 06:21:34 -07:00
Pierce Freeman	9af165c389	Clean setup.py imports	2023-06-07 17:27:36 -07:00
Pierce Freeman	494b2aa486	Add notes to github action workflow	2023-06-07 17:06:12 -07:00
Pierce Freeman	ea2ed88623	Refactor and clean of setup.py	2023-06-02 18:25:07 -07:00
Pierce Freeman	9fc9820a5b	Strip cuda name from torch version	2023-06-02 18:25:07 -07:00
Pierce Freeman	5e4699782a	Allow fallback install	2023-06-02 18:25:07 -07:00
Pierce Freeman	0e7769c813	Guessing wheel URL	2023-06-02 18:25:07 -07:00
Pierce Freeman	e1faefce9d	Raise cuda error on build	2023-06-02 18:25:07 -07:00
Pierce Freeman	add4f0bc42	Scaffolding for wheel prototype	2023-06-02 18:25:07 -07:00
Max H. Gerlach	31f78a9814	Allow adding an optional local version to the package version	2023-05-19 17:27:41 +02:00
Tri Dao	eff9fe6b80	Add ninja to pyproject.toml build-system, bump to v1.0.5	2023-05-12 14:20:31 -07:00
Tri Dao	ad113948a6	[Docs] Clearer error message for bwd d > 64, bump to v1.0.4	2023-04-26 09:19:48 -07:00
Tri Dao	fbbb107848	Bump version to v1.0.3.post0	2023-04-21 13:37:23 -07:00
Tri Dao	67ef5d28df	Bump version to 1.0.3	2023-04-21 12:04:53 -07:00
Tri Dao	df1344f866	Bump to v1.0.2	2023-04-15 22:19:31 -07:00
Pavel Shvets	72629ac9ba	add missed module	2023-04-14 20:08:24 +03:00
Tri Dao	853ff72963	Bump version to v1.0.1, fix Cutlass version	2023-04-12 10:05:01 -07:00
Tri Dao	74af023316	Bump version to 1.0.0	2023-04-11 23:32:35 -07:00
Tri Dao	dc08ea1c33	Support H100 for other CUDA extensions	2023-03-15 16:59:27 -07:00
Tri Dao	1b18f1b7a1	Support H100	2023-03-15 14:59:02 -07:00
Tri Dao	33e0860c9c	Bump to v0.2.8	2023-01-19 13:17:19 -08:00
Tri Dao	d509832426	[Compilation] Add _NO_HALF2 flags to be consistent with Pytorch `eb7b89771e/cmake/Dependencies.cmake (L1693)`	2023-01-12 22:15:41 -08:00
Tri Dao	ce26d3d73d	Bump to v0.2.7	2023-01-06 17:37:30 -08:00
Tri Dao	a6ec1782dc	Bump to v0.2.6	2022-12-27 22:05:20 -08:00
Tri Dao	1bc6e5b09c	Bump to v0.2.5	2022-12-21 14:33:18 -08:00
Tri Dao	04c4c6106e	Bump to v0.2.4	2022-12-14 14:49:26 -08:00
Tri Dao	a1a5d2ee49	Bump to v0.2.3	2022-12-13 01:37:02 -08:00
Tri Dao	d95ee1a95d	Speed up compilation by splitting into separate .cu files	2022-11-25 16:30:18 -08:00
Tri Dao	054816177e	Bump version to 0.2.1	2022-11-20 22:35:59 -08:00
Tri Dao	d6ef701aa9	Set version to 0.2.0 (instead of 0.2)	2022-11-15 14:15:05 -08:00

1 2

57 Commits