flash-attention

Author	SHA1	Message	Date
Ying Zhang	dfe1a59e4b	Add var-seq-len to FA3 fp16 / bf16 fwd (#1072 ) * fwd var-seq-len * fixes * benchmark * fixes --------- Co-authored-by: Tri Dao <tridao@users.noreply.github.com>	2024-07-22 21:32:41 -07:00
Cameron Shinn	cb516f855b	Remove torchlib dependency from cpp files (#1083 )	2024-07-22 16:47:09 -07:00
Phil Wang	5f1ae4a34b	backwards for softcapping (#1033 ) * check in the two ways of approaching backwards for softcapping, both functional * prepare the softcap switch for backwards * temporary * cleanup to the way Tri prefers * calculate dtanh when copying from scores -> dtanh Tensor * no ternary operators allowed for constexpr, so just use some hack found online * fix maybe_dtanh, restore some files * restore another file * move calculate_dtanh to utils and colocate with apply_softcap * cleanup * maybe last cleanup * save for another pr * remove a stray line * fix spacing * fix an issue, and make test_flash_attn.py ready to test softcapping backwards	2024-07-21 23:25:46 -07:00
youkaichao	ef3e358a25	remove lambda (#1056 )	2024-07-21 23:24:38 -07:00
Jorge António	4df62e1440	catch typo (#1058 )	2024-07-21 23:24:15 -07:00
Tri Dao	74b0761ff7	[FA3] BF16 forward	2024-07-14 23:39:46 -07:00
Tri Dao	898dd4bbf2	Pass seqused_k to _flash_attn_varlen_forward	2024-07-13 00:08:27 -07:00
Tri Dao	7ef24848cf	Add FA3 image	2024-07-11 09:54:05 -07:00
Tri Dao	7f67966cc7	FA3 initial code release	2024-07-11 09:53:36 -07:00
Tri Dao	b4a9dd6c9c	Temporarily switch to cutlass fork for more shapes	2024-07-11 09:29:21 -07:00
Tri Dao	7551202cb2	Bump to v2.6.1	2024-07-11 08:28:32 -07:00
Tri Dao	844912dca0	[CI] Switch from CUDA 12.2 to 12.3	2024-07-11 08:20:09 -07:00
Tri Dao	40e534a7f6	Implement cache_leftpad	2024-07-11 08:17:15 -07:00
Tri Dao	116b05f9b0	[CI] Compile with pytorch 2.4.0.dev20240514	2024-07-11 02:53:30 -07:00
Tri Dao	da11d1b853	Bump v2.6.0	2024-07-10 21:34:58 -07:00
Tri Dao	d0787acc16	Relax dropout_fraction test	2024-07-10 11:49:40 -07:00
Tri Dao	dca6d89da4	Don't support softcap and dropout at the same time These tests are failing so I'm just disabling this case for now	2024-07-10 11:23:12 -07:00
Tri Dao	81e01efd4b	More typo fixes	2024-07-10 10:19:17 -07:00
Tri Dao	72e27c6320	Fix typo with softcapping	2024-07-10 00:33:52 -07:00
Tri Dao	3d41db3e2c	Only test backward if there's no softcapping	2024-07-10 00:27:45 -07:00
Tri Dao	908511b2b6	Split into more .cu files to speed up compilation	2024-07-10 00:24:04 -07:00
Tri Dao	1d536d7de5	Minor cleanup of softcapping	2024-07-09 22:57:03 -07:00
Tri Dao	beb2bf2a32	Drop support for pytorch 1.12, 1.13, and python 3.7	2024-07-09 22:13:15 -07:00
Phil Wang	f4628b43ec	missing commas and backwards return arguments (#1032 ) * missing commas * another fix	2024-07-09 10:56:29 -07:00
Nicolas Patry	8f873cc6ac	Implement softcapping. (#1025 ) * Softcap v2 (fwd only). * Some missing interface + remove overrides in tests.	2024-07-08 11:24:48 -07:00
Jianwei Dong	4e8d60069f	Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989 )	2024-07-08 08:29:40 -07:00
muoshuosha	6df7e0a02e	Fix the varlen deterministic test (#1023 ) Co-authored-by: moshuosha <moshuosha@qq.com>	2024-07-03 11:07:57 -07:00
66RING	9486635c92	Fix typos of comments about shape. (#837 )	2024-06-30 22:40:59 -07:00
JDKWangGuan	0d810cfb73	Fix KeyError handling for non-existing key in state_dict.pop() (#898 ) Update handling for KeyError in state_dict.pop() for non-existing keys. Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions. The following code can re-produce the issue ``` from transformers import AutoTokenizer, GPT2Model, GPT2Config from flash_attn.models.gpt import GPTLMHeadModel, GPTModel # >>> transformers.__version__ # '4.38.2' model_path = 'gpt2' output_model_path = 'gpt2_model' config = GPT2Config.from_pretrained(model_path, output_hidden_states=True) model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config) ''' model fine-tuning here ''' # dump the fine-tuned model model.save_pretrained(output_model_path) # load the fine-tuned model config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True) model = GPTModel.from_pretrained(output_model_path, config=config, strict=True) # failed due to KeyError: 'h.0.attn.bias' model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True) # failed due to KeyError: 'h.0.attn.bias' ```	2024-06-30 22:40:03 -07:00
cao lei	6a2a16e994	fix typo (#974 )	2024-06-30 22:39:39 -07:00
Nicolas Patry	5bf201966a	Fixing argument checking when using `seqlenq_ngroups_swapped`. (#976 ) When user send `out` as a parameter of the function `seqlenq_ngroups_swapped` with parameters that trigger, the CHECK_SHAPE is incorrect (since q shape is modified.)	2024-06-30 22:39:22 -07:00
Liang	ab59ec3590	remove swizzle part of `sV.data()` to get a completely non-swizzle `sVtNoSwizzle` (#984 ) Co-authored-by: zl <zl@deepseek.com>	2024-06-30 22:38:44 -07:00
Grigory Sizov	f816dee63c	Support unpadded LSE layout (#970 ) * Support unpadded LSE layout. Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by: Jianyu Huang <hjyahead@gmail.com> * Cleanup * Fix unpadded LSE on split-kv path * Fix formatting and comments * Fix inline vs forceinline --------- Co-authored-by: Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by: Jianyu Huang <hjyahead@gmail.com>	2024-06-27 02:38:13 -07:00
Tri Dao	320fb59487	Update citation	2024-05-26 16:09:03 -07:00
Tri Dao	e2e4333c95	Limit to MAX_JOBS=1 with CUDA 12.2	2024-05-26 15:35:49 -07:00
Tri Dao	ce73503578	Bump to 2.5.9	2024-05-26 14:02:11 -07:00
Tri Dao	d732be1e67	Update to Cutlass 3.5	2024-05-26 12:49:33 -07:00
Tri Dao	af627063e3	[CI] Compile for pytorch 2.4.0.dev20240407 (for nvcr 24.05)	2024-05-26 12:41:17 -07:00
Wongboo	40e667236c	Update for python3.12 (#870 )	2024-05-26 12:34:49 -07:00
Corey James Levinson	beb8b8ba9f	add exception to Timeout Error (#963 ) When timeout connecting, you get URLError: <urlopen error timed out>, In that case, build it from source.	2024-05-26 12:33:03 -07:00
lancerts	22339db185	remove an unused import (#960 )	2024-05-23 11:12:31 -07:00
Wei Ji	9c0e9ee86d	Move packaging and ninja from install_requires to setup_requires (#937 ) Set `packaging` and `ninja` as build time dependencies rather than runtime dependencies.	2024-05-06 09:45:54 -07:00
Tri Dao	9a11f440d3	Bump to v2.5.8	2024-04-26 10:54:52 -07:00
Tri Dao	35060e7450	[CI] Compile for pytorch 2.2.2 and 2.3.0	2024-04-26 10:53:24 -07:00
Tri Dao	ec6d22143b	[CrossEntropy] Change ignored_index -> ignore_index	2024-04-26 10:50:41 -07:00
Tri Dao	85881f547f	Bump to v2.5.7	2024-04-07 20:13:05 -07:00
Tri Dao	2aea958f89	[CI] Compile with torch 2.3.0.dev20240207	2024-04-07 20:11:52 -07:00
Tri Dao	656daef4ea	Use Cute's local_tile to get gQ, gK, gV	2024-04-07 20:10:19 -07:00
Tri Dao	9eb3d099c1	Transpose out when swapping seqlen_q and num_groups	2024-04-07 20:10:19 -07:00
Ivan Komarov	f692b98d80	Fix spurious re-compilations of `rotary_kernel` (#911 ) All integer parameters are specialized by default, so the two parameters removed in this commit could lead to kernel re-compilation, even if they were completely unused.	2024-04-05 13:40:41 -07:00

1 2 3 4 5 ...

648 Commits