Tri Dao
320fb59487
Update citation
2024-05-26 16:09:03 -07:00
Tri Dao
e2e4333c95
Limit to MAX_JOBS=1 with CUDA 12.2
2024-05-26 15:35:49 -07:00
Tri Dao
ce73503578
Bump to 2.5.9
2024-05-26 14:02:11 -07:00
Tri Dao
d732be1e67
Update to Cutlass 3.5
2024-05-26 12:49:33 -07:00
Tri Dao
af627063e3
[CI] Compile for pytorch 2.4.0.dev20240407 (for nvcr 24.05)
2024-05-26 12:41:17 -07:00
Wongboo
40e667236c
Update for python3.12 ( #870 )
2024-05-26 12:34:49 -07:00
Corey James Levinson
beb8b8ba9f
add exception to Timeout Error ( #963 )
...
When timeout connecting, you get URLError: <urlopen error timed out>, In that case, build it from source.
2024-05-26 12:33:03 -07:00
lancerts
22339db185
remove an unused import ( #960 )
2024-05-23 11:12:31 -07:00
Wei Ji
9c0e9ee86d
Move packaging and ninja from install_requires to setup_requires ( #937 )
...
Set `packaging` and `ninja` as build time dependencies rather than runtime dependencies.
2024-05-06 09:45:54 -07:00
Tri Dao
9a11f440d3
Bump to v2.5.8
2024-04-26 10:54:52 -07:00
Tri Dao
35060e7450
[CI] Compile for pytorch 2.2.2 and 2.3.0
2024-04-26 10:53:24 -07:00
Tri Dao
ec6d22143b
[CrossEntropy] Change ignored_index -> ignore_index
2024-04-26 10:50:41 -07:00
Tri Dao
85881f547f
Bump to v2.5.7
2024-04-07 20:13:05 -07:00
Tri Dao
2aea958f89
[CI] Compile with torch 2.3.0.dev20240207
2024-04-07 20:11:52 -07:00
Tri Dao
656daef4ea
Use Cute's local_tile to get gQ, gK, gV
2024-04-07 20:10:19 -07:00
Tri Dao
9eb3d099c1
Transpose out when swapping seqlen_q and num_groups
2024-04-07 20:10:19 -07:00
Ivan Komarov
f692b98d80
Fix spurious re-compilations of rotary_kernel ( #911 )
...
All integer parameters are specialized by default, so the two parameters
removed in this commit could lead to kernel re-compilation, even if
they were completely unused.
2024-04-05 13:40:41 -07:00
Driss Guessous
23e8fa5a26
Add the option for the macro and note ( #893 )
2024-03-27 19:12:11 -07:00
ljss
3e9414f1c3
Minor fix in compute_attn_1rowblock_splitkv ( #900 )
2024-03-27 19:11:45 -07:00
Tri Dao
36587c01cb
[LayerNorm] Update layer_norm_linear
2024-03-18 23:15:33 -07:00
Markus Krimmel
6bbc532388
fix: cast the alibi slopes to torch.float32 ( #846 )
2024-03-15 00:49:40 -07:00
Driss Guessous
4a73e903da
Add in, macrosf for defining __grid_constant__ ( #852 )
2024-03-15 00:48:54 -07:00
Grigory Sizov
2a15840f09
Enable paged attention in varlen forward ( #831 )
...
* Enable paged attention in varlen forward
* Format + fix padding
2024-03-15 00:48:19 -07:00
Arvind Sundararajan
26c9e82743
Support ARM builds ( #757 )
2024-03-13 21:57:20 -07:00
Chirag Jain
50896ec574
Make nvcc threads configurable via environment variable ( #885 )
2024-03-13 20:46:57 -07:00
Tri Dao
6c9e60de56
Bump to v2.5.6
2024-03-01 22:09:56 -08:00
Tri Dao
6e2fa30797
[CI] Change torch 2.3.0.dev20240126 to 20240105 for nvcr 24.02
2024-03-01 22:08:10 -08:00
Tri Dao
87a1277653
Bump to v2.5.5
2024-02-21 15:58:23 -08:00
Tri Dao
2406f28805
Enable headdim 256 backward on consumer GPUs (Ampere, Ada)
2024-02-21 15:56:19 -08:00
Tri Dao
43950dda45
Bump to v2.5.4
2024-02-20 16:30:16 -08:00
Tri Dao
4d6b794b3c
Update Cutlass to v3.4.1
2024-02-20 16:28:21 -08:00
Tri Dao
b32efb1a4d
Don't need to reduce row_sum during online softmax
2024-02-20 13:33:38 -08:00
Qubitium
f45bbb4c94
Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. ( #832 )
2024-02-17 18:17:15 -08:00
Tri Dao
5cdabc2809
Bump to v2.5.3
2024-02-10 01:06:27 -08:00
Tri Dao
d9a5cb291c
Fix dv = torch::empty_like(k) for mha_bwd_varlen as well
2024-02-10 01:03:00 -08:00
Tri Dao
a190df011c
Add window_size option to ParallelMHA
2024-02-10 01:02:14 -08:00
Brian Hirsh
2423cca3ad
fix backward for when query and key have different contiguity ( #818 )
2024-02-10 01:01:27 -08:00
Grigory Sizov
4687936413
Fix Windows build ( #816 )
2024-02-07 17:41:53 -08:00
Tri Dao
61a7772479
Bump to v2.5.2
2024-01-31 02:44:24 -08:00
Tri Dao
6a5c053c3e
[CI] Compile with torch 2.2.0 instead of 2.2.0.dev20231106
2024-01-31 02:43:12 -08:00
Tri Dao
ef0ed10622
Add window_size option to MHA and GPT
2024-01-31 02:42:23 -08:00
Tri Dao
dc72d960a7
[CI] Install torch 2.3 using index
2024-01-30 14:32:29 -08:00
Tri Dao
daf37a9d8a
Bump to v2.5.1
2024-01-29 21:03:38 -08:00
Tri Dao
aa2eb8ddf2
[CI] Compile with pytorch 2.2.0.dev20231106
2024-01-29 20:49:18 -08:00
Jeremy Reizenstein
0658e320f6
Preprocessor switches to control functionality ( #788 )
...
For faster and smaller builds in some simple cases,
provide switches to allow disabling
-backward
-alibi
-uneven k
-dropout
-local attention
Co-authored-by: Jeremy Francis Reizenstein <bottler@users.noreply.github.com>
2024-01-29 20:44:23 -08:00
Christian Kadner
290596c544
[CI] Build wheels for Pytorch 2.3 (dev/nightly) ( #793 )
...
* [CI] Build wheels for Pytorch 2.3 (dev/nightly)
Resolves #790
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
* update TORCH_CUDA_VERSION
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
* revert torch 2.2 back to dev20231130
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
* add link to PyTorch compatibility matrix
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
---------
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
2024-01-29 17:53:38 -08:00
Avelina9X
c94cd09744
Updated missing docstrings for args and returns in bert_padding.py ( #795 )
...
* Updated docstrings of bert_padding.py
Added docstrings for missing arguments in the unpad and pad methods.
* Update bert_padding.py
Fixed spelling mistakes
2024-01-27 09:16:25 -08:00
Tri Dao
ffc8682dd5
Add benchmarking code for Alibi (from Sanghun Cho)
2024-01-23 19:00:49 -08:00
Tao He
204c3c6d1b
Fixes an error in comment ( #785 )
...
Signed-off-by: Tao He <sighingnow@gmail.com>
2024-01-23 12:38:29 -08:00
Tri Dao
197f2083a2
Bump to v2.5.0
2024-01-22 23:40:10 -08:00