Commit Graph

57 Commits

Author SHA1 Message Date
Tri Dao
2aea958f89 [CI] Compile with torch 2.3.0.dev20240207 2024-04-07 20:11:52 -07:00
Arvind Sundararajan
26c9e82743
Support ARM builds (#757) 2024-03-13 21:57:20 -07:00
Chirag Jain
50896ec574
Make nvcc threads configurable via environment variable (#885) 2024-03-13 20:46:57 -07:00
Qubitium
f45bbb4c94
Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832) 2024-02-17 18:17:15 -08:00
Tri Dao
d4a7c8ffbb [CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly 2023-11-27 16:21:28 -08:00
Tri Dao
5e525a8dc8 [CI] Use official Pytorch 2.1, add CUDA 11.8 for Pytorch 2.1 2023-10-03 22:20:30 -07:00
Tri Dao
1879e089c7 Reduce number of templates for headdim > 128 2023-09-23 22:24:30 -07:00
Tri Dao
bff3147175 Re-enable compilation for Hopper 2023-09-21 23:55:25 -07:00
Tri Dao
dfe29f5e2b [Gen] Don't use ft_attention, use flash_attn_with_kvcache instead 2023-09-18 15:29:06 -07:00
Federico Berto
fa3ddcbaaa
[Minor] add nvcc note on bare_metal_version RuntimeError (#552)
* Add nvcc note on bare_metal_version `RuntimeError`

* Run Black formatting
2023-09-18 11:48:15 -07:00
Tri Dao
799f56fa90 Don't compile for Pytorch 2.1 on CUDA 12.1 due to nvcc segfaults 2023-09-17 22:15:38 -07:00
Tri Dao
bb9beb3645 Remove some unused headers 2023-09-12 12:37:10 -07:00
Tri Dao
0c04943fa2 Require CUDA 11.6+, clean up setup.py 2023-09-03 21:24:56 -07:00
Tri Dao
b1fbbd8337 Implement splitKV attention 2023-08-29 00:58:29 -07:00
Tri Dao
cbb4cf5f46 Don't need to set TORCH_CUDA_ARCH_LIST in setup.py 2023-08-18 14:18:54 -07:00
Aman Gupta Karmani
aab603af4f
fix binary wheel installation when nvcc is not available (#448) 2023-08-14 14:54:26 -07:00
Tri Dao
9c531bdc0a Use single thread compilation for cuda12.1, torch2.1 to avoid OOM CI 2023-08-14 10:03:31 -07:00
Tri Dao
2ddeaa406c Fix wheel building 2023-08-13 16:48:47 -07:00
Tri Dao
3c458cff77 Merge branch 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention into piercefreeman-feature/demo-wheels
* 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention: (25 commits)
  Install standard non-wheel package
  Remove release creation
  Build wheel on each push
  Isolate 2.0.0 & cuda12
  Clean setup.py imports
  Remove builder project
  Bump version
  Add notes to github action workflow
  Add torch dependency to final build
  Exclude cuda erroring builds
  Exclude additional disallowed matrix params
  Full version matrix
  Add CUDA 11.7
  Release is actually unsupported
  echo OS version
  Temp disable deploy
  OS version build numbers
  Restore full build matrix
  Refactor and clean of setup.py
  Strip cuda name from torch version
  ...
2023-08-13 16:03:51 -07:00
Tri Dao
1c41d2b0e5 Fix race condition in bwd (overwriting sK) 2023-08-01 09:00:10 -07:00
Tri Dao
4f285b3547 FlashAttention-2 release 2023-07-17 06:21:34 -07:00
Pierce Freeman
9af165c389 Clean setup.py imports 2023-06-07 17:27:36 -07:00
Pierce Freeman
494b2aa486 Add notes to github action workflow 2023-06-07 17:06:12 -07:00
Pierce Freeman
ea2ed88623 Refactor and clean of setup.py 2023-06-02 18:25:07 -07:00
Pierce Freeman
9fc9820a5b Strip cuda name from torch version 2023-06-02 18:25:07 -07:00
Pierce Freeman
5e4699782a Allow fallback install 2023-06-02 18:25:07 -07:00
Pierce Freeman
0e7769c813 Guessing wheel URL 2023-06-02 18:25:07 -07:00
Pierce Freeman
e1faefce9d Raise cuda error on build 2023-06-02 18:25:07 -07:00
Pierce Freeman
add4f0bc42 Scaffolding for wheel prototype 2023-06-02 18:25:07 -07:00
Max H. Gerlach
31f78a9814 Allow adding an optional local version to the package version 2023-05-19 17:27:41 +02:00
Tri Dao
eff9fe6b80 Add ninja to pyproject.toml build-system, bump to v1.0.5 2023-05-12 14:20:31 -07:00
Tri Dao
ad113948a6 [Docs] Clearer error message for bwd d > 64, bump to v1.0.4 2023-04-26 09:19:48 -07:00
Tri Dao
fbbb107848 Bump version to v1.0.3.post0 2023-04-21 13:37:23 -07:00
Tri Dao
67ef5d28df Bump version to 1.0.3 2023-04-21 12:04:53 -07:00
Tri Dao
df1344f866 Bump to v1.0.2 2023-04-15 22:19:31 -07:00
Pavel Shvets
72629ac9ba add missed module 2023-04-14 20:08:24 +03:00
Tri Dao
853ff72963 Bump version to v1.0.1, fix Cutlass version 2023-04-12 10:05:01 -07:00
Tri Dao
74af023316 Bump version to 1.0.0 2023-04-11 23:32:35 -07:00
Tri Dao
dc08ea1c33 Support H100 for other CUDA extensions 2023-03-15 16:59:27 -07:00
Tri Dao
1b18f1b7a1 Support H100 2023-03-15 14:59:02 -07:00
Tri Dao
33e0860c9c Bump to v0.2.8 2023-01-19 13:17:19 -08:00
Tri Dao
d509832426 [Compilation] Add _NO_HALF2 flags to be consistent with Pytorch
eb7b89771e/cmake/Dependencies.cmake (L1693)
2023-01-12 22:15:41 -08:00
Tri Dao
ce26d3d73d Bump to v0.2.7 2023-01-06 17:37:30 -08:00
Tri Dao
a6ec1782dc Bump to v0.2.6 2022-12-27 22:05:20 -08:00
Tri Dao
1bc6e5b09c Bump to v0.2.5 2022-12-21 14:33:18 -08:00
Tri Dao
04c4c6106e Bump to v0.2.4 2022-12-14 14:49:26 -08:00
Tri Dao
a1a5d2ee49 Bump to v0.2.3 2022-12-13 01:37:02 -08:00
Tri Dao
d95ee1a95d Speed up compilation by splitting into separate .cu files 2022-11-25 16:30:18 -08:00
Tri Dao
054816177e Bump version to 0.2.1 2022-11-20 22:35:59 -08:00
Tri Dao
d6ef701aa9 Set version to 0.2.0 (instead of 0.2) 2022-11-15 14:15:05 -08:00