flash-attention

Author	SHA1	Message	Date
Tri Dao	bedf877467	[CrossEntropy] Fix where labels address not aligned to 16 bytes	2024-10-05 02:03:10 -07:00
Zhihao Shen	30e1ef0f79	minify torch.torch.int32 to torch.int32 (#1237 )	2024-09-18 00:32:59 -07:00
Antoni Viros	83e41b3ca4	Add custom ops for compatibility with PT Compile (#1139 ) * Add custom ops for compatibility with PT Compile * Add support for varlen functions too * Add version checks for pytorch API * Fix PT compile interfaces so it works e2e * Make sure PT < 2.4 runs fine * Fix python mistake * Fix all the autograd magic issues * typo on head_dim * Fix deterministic test failures, remove unneeded detaches() * remove test requires_grad * Resolve all the pytorch versioning issues * C++ and python refactor to improve padding management for torch.compile() * Add improvements suggested by @anijain2305	2024-09-17 19:49:26 -07:00
Ying Zhang	8cbc8a042f	small fixes	2024-09-16 14:54:39 -07:00
Ying Zhang	cdbbe844b1	minor changes to unpad_input test util func	2024-09-16 14:24:11 -07:00
Ying Zhang	db80387343	Add seqused_q in fwd / bwd and seqused_k in bwd.	2024-09-16 14:24:11 -07:00
Tri Dao	8c20cfef49	[Rotary] Support qkv block layout from GQA	2024-09-11 10:39:58 -07:00
Tri Dao	c7f32a8409	[CrossEntropy] Support precomputed LSE	2024-09-08 09:24:43 -07:00
Tri Dao	d79f9b41a8	[CrossEntropy] Use online softmax to simplify implementation	2024-08-24 17:40:39 -07:00
Tri Dao	bcd918f275	[LayerNorm] Add option to write result to out and residual_out	2024-08-15 14:43:47 -07:00
Tri Dao	bd82d6c6eb	Revert "[LayerNorm] Don't store x + residual if we don't need gradients" This reverts commit `800401847e`.	2024-08-15 12:02:39 -07:00
Tri Dao	800401847e	[LayerNorm] Don't store x + residual if we don't need gradients	2024-08-15 11:08:46 -07:00
SueJane	3f1b4d38e7	Fix: check the type of max_seqlen_k instead of checking max_seqlen twice (#1127 )	2024-08-05 08:59:23 -07:00
Tri Dao	418d677192	Bump to v2.6.3	2024-07-25 01:31:28 -07:00
Tri Dao	59594f2a67	Bump to v2.6.2	2024-07-23 02:30:05 -07:00
youkaichao	ef3e358a25	remove lambda (#1056 )	2024-07-21 23:24:38 -07:00
Tri Dao	898dd4bbf2	Pass seqused_k to _flash_attn_varlen_forward	2024-07-13 00:08:27 -07:00
Tri Dao	7551202cb2	Bump to v2.6.1	2024-07-11 08:28:32 -07:00
Tri Dao	40e534a7f6	Implement cache_leftpad	2024-07-11 08:17:15 -07:00
Tri Dao	116b05f9b0	[CI] Compile with pytorch 2.4.0.dev20240514	2024-07-11 02:53:30 -07:00
Tri Dao	da11d1b853	Bump v2.6.0	2024-07-10 21:34:58 -07:00
Tri Dao	81e01efd4b	More typo fixes	2024-07-10 10:19:17 -07:00
Tri Dao	72e27c6320	Fix typo with softcapping	2024-07-10 00:33:52 -07:00
Phil Wang	f4628b43ec	missing commas and backwards return arguments (#1032 ) * missing commas * another fix	2024-07-09 10:56:29 -07:00
Nicolas Patry	8f873cc6ac	Implement softcapping. (#1025 ) * Softcap v2 (fwd only). * Some missing interface + remove overrides in tests.	2024-07-08 11:24:48 -07:00
Jianwei Dong	4e8d60069f	Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989 )	2024-07-08 08:29:40 -07:00
JDKWangGuan	0d810cfb73	Fix KeyError handling for non-existing key in state_dict.pop() (#898 ) Update handling for KeyError in state_dict.pop() for non-existing keys. Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions. The following code can re-produce the issue ``` from transformers import AutoTokenizer, GPT2Model, GPT2Config from flash_attn.models.gpt import GPTLMHeadModel, GPTModel # >>> transformers.__version__ # '4.38.2' model_path = 'gpt2' output_model_path = 'gpt2_model' config = GPT2Config.from_pretrained(model_path, output_hidden_states=True) model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config) ''' model fine-tuning here ''' # dump the fine-tuned model model.save_pretrained(output_model_path) # load the fine-tuned model config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True) model = GPTModel.from_pretrained(output_model_path, config=config, strict=True) # failed due to KeyError: 'h.0.attn.bias' model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True) # failed due to KeyError: 'h.0.attn.bias' ```	2024-06-30 22:40:03 -07:00
Grigory Sizov	f816dee63c	Support unpadded LSE layout (#970 ) * Support unpadded LSE layout. Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by: Jianyu Huang <hjyahead@gmail.com> * Cleanup * Fix unpadded LSE on split-kv path * Fix formatting and comments * Fix inline vs forceinline --------- Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by: Jianyu Huang <hjyahead@gmail.com>	2024-06-27 02:38:13 -07:00
Tri Dao	320fb59487	Update citation	2024-05-26 16:09:03 -07:00
Tri Dao	e2e4333c95	Limit to MAX_JOBS=1 with CUDA 12.2	2024-05-26 15:35:49 -07:00
Tri Dao	ce73503578	Bump to 2.5.9	2024-05-26 14:02:11 -07:00
lancerts	22339db185	remove an unused import (#960 )	2024-05-23 11:12:31 -07:00
Tri Dao	9a11f440d3	Bump to v2.5.8	2024-04-26 10:54:52 -07:00
Tri Dao	ec6d22143b	[CrossEntropy] Change ignored_index -> ignore_index	2024-04-26 10:50:41 -07:00
Tri Dao	85881f547f	Bump to v2.5.7	2024-04-07 20:13:05 -07:00
Ivan Komarov	f692b98d80	Fix spurious re-compilations of `rotary_kernel` (#911 ) All integer parameters are specialized by default, so the two parameters removed in this commit could lead to kernel re-compilation, even if they were completely unused.	2024-04-05 13:40:41 -07:00
Tri Dao	36587c01cb	[LayerNorm] Update layer_norm_linear	2024-03-18 23:15:33 -07:00
Markus Krimmel	6bbc532388	fix: cast the alibi slopes to torch.float32 (#846 )	2024-03-15 00:49:40 -07:00
Grigory Sizov	2a15840f09	Enable paged attention in varlen forward (#831 ) * Enable paged attention in varlen forward * Format + fix padding	2024-03-15 00:48:19 -07:00
Tri Dao	6c9e60de56	Bump to v2.5.6	2024-03-01 22:09:56 -08:00
Tri Dao	87a1277653	Bump to v2.5.5	2024-02-21 15:58:23 -08:00
Tri Dao	43950dda45	Bump to v2.5.4	2024-02-20 16:30:16 -08:00
Tri Dao	5cdabc2809	Bump to v2.5.3	2024-02-10 01:06:27 -08:00
Tri Dao	a190df011c	Add window_size option to ParallelMHA	2024-02-10 01:02:14 -08:00
Tri Dao	61a7772479	Bump to v2.5.2	2024-01-31 02:44:24 -08:00
Tri Dao	ef0ed10622	Add window_size option to MHA and GPT	2024-01-31 02:42:23 -08:00
Tri Dao	dc72d960a7	[CI] Install torch 2.3 using index	2024-01-30 14:32:29 -08:00
Tri Dao	daf37a9d8a	Bump to v2.5.1	2024-01-29 21:03:38 -08:00
Avelina9X	c94cd09744	Updated missing docstrings for args and returns in bert_padding.py (#795 ) * Updated docstrings of bert_padding.py Added docstrings for missing arguments in the unpad and pad methods. * Update bert_padding.py Fixed spelling mistakes	2024-01-27 09:16:25 -08:00
Tao He	204c3c6d1b	Fixes an error in comment (#785 ) Signed-off-by: Tao He <sighingnow@gmail.com>	2024-01-23 12:38:29 -08:00

1 2 3 4 5 ...

337 Commits