Tri Dao
2aea958f89
[CI] Compile with torch 2.3.0.dev20240207
2024-04-07 20:11:52 -07:00
Arvind Sundararajan
26c9e82743
Support ARM builds ( #757 )
2024-03-13 21:57:20 -07:00
Chirag Jain
50896ec574
Make nvcc threads configurable via environment variable ( #885 )
2024-03-13 20:46:57 -07:00
Qubitium
f45bbb4c94
Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. ( #832 )
2024-02-17 18:17:15 -08:00
Tri Dao
d4a7c8ffbb
[CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly
2023-11-27 16:21:28 -08:00
Tri Dao
5e525a8dc8
[CI] Use official Pytorch 2.1, add CUDA 11.8 for Pytorch 2.1
2023-10-03 22:20:30 -07:00
Tri Dao
1879e089c7
Reduce number of templates for headdim > 128
2023-09-23 22:24:30 -07:00
Tri Dao
bff3147175
Re-enable compilation for Hopper
2023-09-21 23:55:25 -07:00
Tri Dao
dfe29f5e2b
[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead
2023-09-18 15:29:06 -07:00
Federico Berto
fa3ddcbaaa
[Minor] add nvcc note on bare_metal_version RuntimeError ( #552 )
...
* Add nvcc note on bare_metal_version `RuntimeError`
* Run Black formatting
2023-09-18 11:48:15 -07:00
Tri Dao
799f56fa90
Don't compile for Pytorch 2.1 on CUDA 12.1 due to nvcc segfaults
2023-09-17 22:15:38 -07:00
Tri Dao
bb9beb3645
Remove some unused headers
2023-09-12 12:37:10 -07:00
Tri Dao
0c04943fa2
Require CUDA 11.6+, clean up setup.py
2023-09-03 21:24:56 -07:00
Tri Dao
b1fbbd8337
Implement splitKV attention
2023-08-29 00:58:29 -07:00
Tri Dao
cbb4cf5f46
Don't need to set TORCH_CUDA_ARCH_LIST in setup.py
2023-08-18 14:18:54 -07:00
Aman Gupta Karmani
aab603af4f
fix binary wheel installation when nvcc is not available ( #448 )
2023-08-14 14:54:26 -07:00
Tri Dao
9c531bdc0a
Use single thread compilation for cuda12.1, torch2.1 to avoid OOM CI
2023-08-14 10:03:31 -07:00
Tri Dao
2ddeaa406c
Fix wheel building
2023-08-13 16:48:47 -07:00
Tri Dao
3c458cff77
Merge branch 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention into piercefreeman-feature/demo-wheels
...
* 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention : (25 commits)
Install standard non-wheel package
Remove release creation
Build wheel on each push
Isolate 2.0.0 & cuda12
Clean setup.py imports
Remove builder project
Bump version
Add notes to github action workflow
Add torch dependency to final build
Exclude cuda erroring builds
Exclude additional disallowed matrix params
Full version matrix
Add CUDA 11.7
Release is actually unsupported
echo OS version
Temp disable deploy
OS version build numbers
Restore full build matrix
Refactor and clean of setup.py
Strip cuda name from torch version
...
2023-08-13 16:03:51 -07:00
Tri Dao
1c41d2b0e5
Fix race condition in bwd (overwriting sK)
2023-08-01 09:00:10 -07:00
Tri Dao
4f285b3547
FlashAttention-2 release
2023-07-17 06:21:34 -07:00
Pierce Freeman
9af165c389
Clean setup.py imports
2023-06-07 17:27:36 -07:00
Pierce Freeman
494b2aa486
Add notes to github action workflow
2023-06-07 17:06:12 -07:00
Pierce Freeman
ea2ed88623
Refactor and clean of setup.py
2023-06-02 18:25:07 -07:00
Pierce Freeman
9fc9820a5b
Strip cuda name from torch version
2023-06-02 18:25:07 -07:00
Pierce Freeman
5e4699782a
Allow fallback install
2023-06-02 18:25:07 -07:00
Pierce Freeman
0e7769c813
Guessing wheel URL
2023-06-02 18:25:07 -07:00
Pierce Freeman
e1faefce9d
Raise cuda error on build
2023-06-02 18:25:07 -07:00
Pierce Freeman
add4f0bc42
Scaffolding for wheel prototype
2023-06-02 18:25:07 -07:00
Max H. Gerlach
31f78a9814
Allow adding an optional local version to the package version
2023-05-19 17:27:41 +02:00
Tri Dao
eff9fe6b80
Add ninja to pyproject.toml build-system, bump to v1.0.5
2023-05-12 14:20:31 -07:00
Tri Dao
ad113948a6
[Docs] Clearer error message for bwd d > 64, bump to v1.0.4
2023-04-26 09:19:48 -07:00
Tri Dao
fbbb107848
Bump version to v1.0.3.post0
2023-04-21 13:37:23 -07:00
Tri Dao
67ef5d28df
Bump version to 1.0.3
2023-04-21 12:04:53 -07:00
Tri Dao
df1344f866
Bump to v1.0.2
2023-04-15 22:19:31 -07:00
Pavel Shvets
72629ac9ba
add missed module
2023-04-14 20:08:24 +03:00
Tri Dao
853ff72963
Bump version to v1.0.1, fix Cutlass version
2023-04-12 10:05:01 -07:00
Tri Dao
74af023316
Bump version to 1.0.0
2023-04-11 23:32:35 -07:00
Tri Dao
dc08ea1c33
Support H100 for other CUDA extensions
2023-03-15 16:59:27 -07:00
Tri Dao
1b18f1b7a1
Support H100
2023-03-15 14:59:02 -07:00
Tri Dao
33e0860c9c
Bump to v0.2.8
2023-01-19 13:17:19 -08:00
Tri Dao
d509832426
[Compilation] Add _NO_HALF2 flags to be consistent with Pytorch
...
eb7b89771e/cmake/Dependencies.cmake (L1693)
2023-01-12 22:15:41 -08:00
Tri Dao
ce26d3d73d
Bump to v0.2.7
2023-01-06 17:37:30 -08:00
Tri Dao
a6ec1782dc
Bump to v0.2.6
2022-12-27 22:05:20 -08:00
Tri Dao
1bc6e5b09c
Bump to v0.2.5
2022-12-21 14:33:18 -08:00
Tri Dao
04c4c6106e
Bump to v0.2.4
2022-12-14 14:49:26 -08:00
Tri Dao
a1a5d2ee49
Bump to v0.2.3
2022-12-13 01:37:02 -08:00
Tri Dao
d95ee1a95d
Speed up compilation by splitting into separate .cu files
2022-11-25 16:30:18 -08:00
Tri Dao
054816177e
Bump version to 0.2.1
2022-11-20 22:35:59 -08:00
Tri Dao
d6ef701aa9
Set version to 0.2.0 (instead of 0.2)
2022-11-15 14:15:05 -08:00