Commit Graph

337 Commits

Author SHA1 Message Date
Tri Dao
bedf877467 [CrossEntropy] Fix where labels address not aligned to 16 bytes 2024-10-05 02:03:10 -07:00
Zhihao Shen
30e1ef0f79
minify torch.torch.int32 to torch.int32 (#1237) 2024-09-18 00:32:59 -07:00
Antoni Viros
83e41b3ca4
Add custom ops for compatibility with PT Compile (#1139)
* Add custom ops for compatibility with PT Compile

* Add support for varlen functions too

* Add version checks for pytorch API

* Fix PT compile interfaces so it works e2e

* Make sure PT < 2.4 runs fine

* Fix python mistake

* Fix all the autograd magic issues

* typo on head_dim

* Fix deterministic test failures, remove unneeded detaches()

* remove test requires_grad

* Resolve all the pytorch versioning issues

* C++ and python refactor to improve padding management for torch.compile()

* Add improvements suggested by @anijain2305
2024-09-17 19:49:26 -07:00
Ying Zhang
8cbc8a042f small fixes 2024-09-16 14:54:39 -07:00
Ying Zhang
cdbbe844b1 minor changes to unpad_input test util func 2024-09-16 14:24:11 -07:00
Ying Zhang
db80387343 Add seqused_q in fwd / bwd and seqused_k in bwd. 2024-09-16 14:24:11 -07:00
Tri Dao
8c20cfef49 [Rotary] Support qkv block layout from GQA 2024-09-11 10:39:58 -07:00
Tri Dao
c7f32a8409 [CrossEntropy] Support precomputed LSE 2024-09-08 09:24:43 -07:00
Tri Dao
d79f9b41a8 [CrossEntropy] Use online softmax to simplify implementation 2024-08-24 17:40:39 -07:00
Tri Dao
bcd918f275 [LayerNorm] Add option to write result to out and residual_out 2024-08-15 14:43:47 -07:00
Tri Dao
bd82d6c6eb Revert "[LayerNorm] Don't store x + residual if we don't need gradients"
This reverts commit 800401847e.
2024-08-15 12:02:39 -07:00
Tri Dao
800401847e [LayerNorm] Don't store x + residual if we don't need gradients 2024-08-15 11:08:46 -07:00
SueJane
3f1b4d38e7
Fix: check the type of max_seqlen_k instead of checking max_seqlen twice (#1127) 2024-08-05 08:59:23 -07:00
Tri Dao
418d677192 Bump to v2.6.3 2024-07-25 01:31:28 -07:00
Tri Dao
59594f2a67 Bump to v2.6.2 2024-07-23 02:30:05 -07:00
youkaichao
ef3e358a25
remove lambda (#1056) 2024-07-21 23:24:38 -07:00
Tri Dao
898dd4bbf2 Pass seqused_k to _flash_attn_varlen_forward 2024-07-13 00:08:27 -07:00
Tri Dao
7551202cb2 Bump to v2.6.1 2024-07-11 08:28:32 -07:00
Tri Dao
40e534a7f6 Implement cache_leftpad 2024-07-11 08:17:15 -07:00
Tri Dao
116b05f9b0 [CI] Compile with pytorch 2.4.0.dev20240514 2024-07-11 02:53:30 -07:00
Tri Dao
da11d1b853 Bump v2.6.0 2024-07-10 21:34:58 -07:00
Tri Dao
81e01efd4b More typo fixes 2024-07-10 10:19:17 -07:00
Tri Dao
72e27c6320 Fix typo with softcapping 2024-07-10 00:33:52 -07:00
Phil Wang
f4628b43ec
missing commas and backwards return arguments (#1032)
* missing commas

* another fix
2024-07-09 10:56:29 -07:00
Nicolas Patry
8f873cc6ac
Implement softcapping. (#1025)
* Softcap v2 (fwd only).

* Some missing interface + remove overrides in tests.
2024-07-08 11:24:48 -07:00
Jianwei Dong
4e8d60069f
Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989) 2024-07-08 08:29:40 -07:00
JDKWangGuan
0d810cfb73
Fix KeyError handling for non-existing key in state_dict.pop() (#898)
Update handling for KeyError in state_dict.pop() for non-existing keys.
Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions.


The following code can re-produce the issue
```
from transformers import AutoTokenizer, GPT2Model, GPT2Config
from flash_attn.models.gpt import GPTLMHeadModel, GPTModel

# >>> transformers.__version__
# '4.38.2'

model_path = 'gpt2'
output_model_path = 'gpt2_model'
config = GPT2Config.from_pretrained(model_path, output_hidden_states=True)
model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config)
'''
model fine-tuning here
'''
# dump the fine-tuned model
model.save_pretrained(output_model_path)

# load the fine-tuned model
config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True)
model = GPTModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'
model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'

```
2024-06-30 22:40:03 -07:00
Grigory Sizov
f816dee63c
Support unpadded LSE layout (#970)
* Support unpadded LSE layout.

Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>

* Cleanup

* Fix unpadded LSE on split-kv path

* Fix formatting and comments

* Fix inline vs forceinline

---------

Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com>
Co-authored-by: Jianyu Huang <hjyahead@gmail.com>
2024-06-27 02:38:13 -07:00
Tri Dao
320fb59487 Update citation 2024-05-26 16:09:03 -07:00
Tri Dao
e2e4333c95 Limit to MAX_JOBS=1 with CUDA 12.2 2024-05-26 15:35:49 -07:00
Tri Dao
ce73503578 Bump to 2.5.9 2024-05-26 14:02:11 -07:00
lancerts
22339db185
remove an unused import (#960) 2024-05-23 11:12:31 -07:00
Tri Dao
9a11f440d3 Bump to v2.5.8 2024-04-26 10:54:52 -07:00
Tri Dao
ec6d22143b [CrossEntropy] Change ignored_index -> ignore_index 2024-04-26 10:50:41 -07:00
Tri Dao
85881f547f Bump to v2.5.7 2024-04-07 20:13:05 -07:00
Ivan Komarov
f692b98d80
Fix spurious re-compilations of rotary_kernel (#911)
All integer parameters are specialized by default, so the two parameters
removed in this commit could lead to kernel re-compilation, even if
they were completely unused.
2024-04-05 13:40:41 -07:00
Tri Dao
36587c01cb [LayerNorm] Update layer_norm_linear 2024-03-18 23:15:33 -07:00
Markus Krimmel
6bbc532388
fix: cast the alibi slopes to torch.float32 (#846) 2024-03-15 00:49:40 -07:00
Grigory Sizov
2a15840f09
Enable paged attention in varlen forward (#831)
* Enable paged attention in varlen forward

* Format + fix padding
2024-03-15 00:48:19 -07:00
Tri Dao
6c9e60de56 Bump to v2.5.6 2024-03-01 22:09:56 -08:00
Tri Dao
87a1277653 Bump to v2.5.5 2024-02-21 15:58:23 -08:00
Tri Dao
43950dda45 Bump to v2.5.4 2024-02-20 16:30:16 -08:00
Tri Dao
5cdabc2809 Bump to v2.5.3 2024-02-10 01:06:27 -08:00
Tri Dao
a190df011c Add window_size option to ParallelMHA 2024-02-10 01:02:14 -08:00
Tri Dao
61a7772479 Bump to v2.5.2 2024-01-31 02:44:24 -08:00
Tri Dao
ef0ed10622 Add window_size option to MHA and GPT 2024-01-31 02:42:23 -08:00
Tri Dao
dc72d960a7 [CI] Install torch 2.3 using index 2024-01-30 14:32:29 -08:00
Tri Dao
daf37a9d8a Bump to v2.5.1 2024-01-29 21:03:38 -08:00
Avelina9X
c94cd09744
Updated missing docstrings for args and returns in bert_padding.py (#795)
* Updated docstrings of bert_padding.py

Added docstrings for missing arguments in the unpad and pad methods.

* Update bert_padding.py

Fixed spelling mistakes
2024-01-27 09:16:25 -08:00
Tao He
204c3c6d1b
Fixes an error in comment (#785)
Signed-off-by: Tao He <sighingnow@gmail.com>
2024-01-23 12:38:29 -08:00