Tri Dao
74b0761ff7
[FA3] BF16 forward
2024-07-14 23:39:46 -07:00
Tri Dao
898dd4bbf2
Pass seqused_k to _flash_attn_varlen_forward
2024-07-13 00:08:27 -07:00
Tri Dao
7ef24848cf
Add FA3 image
2024-07-11 09:54:05 -07:00
Tri Dao
7f67966cc7
FA3 initial code release
2024-07-11 09:53:36 -07:00
Tri Dao
b4a9dd6c9c
Temporarily switch to cutlass fork for more shapes
2024-07-11 09:29:21 -07:00
Tri Dao
7551202cb2
Bump to v2.6.1
2024-07-11 08:28:32 -07:00
Tri Dao
844912dca0
[CI] Switch from CUDA 12.2 to 12.3
2024-07-11 08:20:09 -07:00
Tri Dao
40e534a7f6
Implement cache_leftpad
2024-07-11 08:17:15 -07:00
Tri Dao
116b05f9b0
[CI] Compile with pytorch 2.4.0.dev20240514
2024-07-11 02:53:30 -07:00
Tri Dao
da11d1b853
Bump v2.6.0
2024-07-10 21:34:58 -07:00
Tri Dao
d0787acc16
Relax dropout_fraction test
2024-07-10 11:49:40 -07:00
Tri Dao
dca6d89da4
Don't support softcap and dropout at the same time
...
These tests are failing so I'm just disabling this case for now
2024-07-10 11:23:12 -07:00
Tri Dao
81e01efd4b
More typo fixes
2024-07-10 10:19:17 -07:00
Tri Dao
72e27c6320
Fix typo with softcapping
2024-07-10 00:33:52 -07:00
Tri Dao
3d41db3e2c
Only test backward if there's no softcapping
2024-07-10 00:27:45 -07:00
Tri Dao
908511b2b6
Split into more .cu files to speed up compilation
2024-07-10 00:24:04 -07:00
Tri Dao
1d536d7de5
Minor cleanup of softcapping
2024-07-09 22:57:03 -07:00
Tri Dao
beb2bf2a32
Drop support for pytorch 1.12, 1.13, and python 3.7
2024-07-09 22:13:15 -07:00
Phil Wang
f4628b43ec
missing commas and backwards return arguments ( #1032 )
...
* missing commas
* another fix
2024-07-09 10:56:29 -07:00
Nicolas Patry
8f873cc6ac
Implement softcapping. ( #1025 )
...
* Softcap v2 (fwd only).
* Some missing interface + remove overrides in tests.
2024-07-08 11:24:48 -07:00
Jianwei Dong
4e8d60069f
Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. ( #989 )
2024-07-08 08:29:40 -07:00
muoshuosha
6df7e0a02e
Fix the varlen deterministic test ( #1023 )
...
Co-authored-by: moshuosha <moshuosha@qq.com>
2024-07-03 11:07:57 -07:00
66RING
9486635c92
Fix typos of comments about shape. ( #837 )
2024-06-30 22:40:59 -07:00
JDKWangGuan
0d810cfb73
Fix KeyError handling for non-existing key in state_dict.pop() ( #898 )
...
Update handling for KeyError in state_dict.pop() for non-existing keys.
Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions.
The following code can re-produce the issue
```
from transformers import AutoTokenizer, GPT2Model, GPT2Config
from flash_attn.models.gpt import GPTLMHeadModel, GPTModel
# >>> transformers.__version__
# '4.38.2'
model_path = 'gpt2'
output_model_path = 'gpt2_model'
config = GPT2Config.from_pretrained(model_path, output_hidden_states=True)
model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config)
'''
model fine-tuning here
'''
# dump the fine-tuned model
model.save_pretrained(output_model_path)
# load the fine-tuned model
config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True)
model = GPTModel.from_pretrained(output_model_path, config=config, strict=True) # failed due to KeyError: 'h.0.attn.bias'
model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True) # failed due to KeyError: 'h.0.attn.bias'
```
2024-06-30 22:40:03 -07:00
cao lei
6a2a16e994
fix typo ( #974 )
2024-06-30 22:39:39 -07:00
Nicolas Patry
5bf201966a
Fixing argument checking when using seqlenq_ngroups_swapped. ( #976 )
...
When user send `out` as a parameter of the function
`seqlenq_ngroups_swapped` with parameters that trigger,
the CHECK_SHAPE is incorrect (since q shape is modified.)
2024-06-30 22:39:22 -07:00
Liang
ab59ec3590
remove swizzle part of sV.data() to get a completely non-swizzle sVtNoSwizzle ( #984 )
...
Co-authored-by: zl <zl@deepseek.com>
2024-06-30 22:38:44 -07:00
Grigory Sizov
f816dee63c
Support unpadded LSE layout ( #970 )
...
* Support unpadded LSE layout.
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>
* Cleanup
* Fix unpadded LSE on split-kv path
* Fix formatting and comments
* Fix inline vs forceinline
---------
Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>
2024-06-27 02:38:13 -07:00
Tri Dao
320fb59487
Update citation
2024-05-26 16:09:03 -07:00
Tri Dao
e2e4333c95
Limit to MAX_JOBS=1 with CUDA 12.2
2024-05-26 15:35:49 -07:00
Tri Dao
ce73503578
Bump to 2.5.9
2024-05-26 14:02:11 -07:00
Tri Dao
d732be1e67
Update to Cutlass 3.5
2024-05-26 12:49:33 -07:00
Tri Dao
af627063e3
[CI] Compile for pytorch 2.4.0.dev20240407 (for nvcr 24.05)
2024-05-26 12:41:17 -07:00
Wongboo
40e667236c
Update for python3.12 ( #870 )
2024-05-26 12:34:49 -07:00
Corey James Levinson
beb8b8ba9f
add exception to Timeout Error ( #963 )
...
When timeout connecting, you get URLError: <urlopen error timed out>, In that case, build it from source.
2024-05-26 12:33:03 -07:00
lancerts
22339db185
remove an unused import ( #960 )
2024-05-23 11:12:31 -07:00
Wei Ji
9c0e9ee86d
Move packaging and ninja from install_requires to setup_requires ( #937 )
...
Set `packaging` and `ninja` as build time dependencies rather than runtime dependencies.
2024-05-06 09:45:54 -07:00
Tri Dao
9a11f440d3
Bump to v2.5.8
2024-04-26 10:54:52 -07:00
Tri Dao
35060e7450
[CI] Compile for pytorch 2.2.2 and 2.3.0
2024-04-26 10:53:24 -07:00
Tri Dao
ec6d22143b
[CrossEntropy] Change ignored_index -> ignore_index
2024-04-26 10:50:41 -07:00
Tri Dao
85881f547f
Bump to v2.5.7
2024-04-07 20:13:05 -07:00
Tri Dao
2aea958f89
[CI] Compile with torch 2.3.0.dev20240207
2024-04-07 20:11:52 -07:00
Tri Dao
656daef4ea
Use Cute's local_tile to get gQ, gK, gV
2024-04-07 20:10:19 -07:00
Tri Dao
9eb3d099c1
Transpose out when swapping seqlen_q and num_groups
2024-04-07 20:10:19 -07:00
Ivan Komarov
f692b98d80
Fix spurious re-compilations of rotary_kernel ( #911 )
...
All integer parameters are specialized by default, so the two parameters
removed in this commit could lead to kernel re-compilation, even if
they were completely unused.
2024-04-05 13:40:41 -07:00
Driss Guessous
23e8fa5a26
Add the option for the macro and note ( #893 )
2024-03-27 19:12:11 -07:00
ljss
3e9414f1c3
Minor fix in compute_attn_1rowblock_splitkv ( #900 )
2024-03-27 19:11:45 -07:00
Tri Dao
36587c01cb
[LayerNorm] Update layer_norm_linear
2024-03-18 23:15:33 -07:00
Markus Krimmel
6bbc532388
fix: cast the alibi slopes to torch.float32 ( #846 )
2024-03-15 00:49:40 -07:00
Driss Guessous
4a73e903da
Add in, macrosf for defining __grid_constant__ ( #852 )
2024-03-15 00:48:54 -07:00