flash-attention

Author	SHA1	Message	Date
Ying Zhang	8cbc8a042f	small fixes	2024-09-16 14:54:39 -07:00
Ying Zhang	cdbbe844b1	minor changes to unpad_input test util func	2024-09-16 14:24:11 -07:00
Ying Zhang	db80387343	Add seqused_q in fwd / bwd and seqused_k in bwd.	2024-09-16 14:24:11 -07:00
Avelina9X	c94cd09744	Updated missing docstrings for args and returns in bert_padding.py (#795 ) * Updated docstrings of bert_padding.py Added docstrings for missing arguments in the unpad and pad methods. * Update bert_padding.py Fixed spelling mistakes	2024-01-27 09:16:25 -08:00
Su Zhu	8f6f48d8a8	add unpad_input_for_concatenated_sequences (#499 ) * add unpad_input_for_concatenated_sequences * modify docstring	2023-08-29 02:23:56 -07:00
Tri Dao	f1a73d0740	Run isort and black on python files	2023-08-18 14:22:11 -07:00
Antoine Adam	4e38df059e	remove numpy dependency According to the `setup.py` file, only dependencies are torch and einops. But the `bert_padding.py` file requires `numpy` only to multiply the elements of a `torch.Size` object. This change aims at allowing the use of FlashAttention without numpy.	2022-10-06 19:17:15 +02:00
Tri Dao	6cc7342575	Support index_first_axis with more than 2 dimensions	2022-08-05 09:48:16 -07:00
Tri Dao	5a61cb7729	Rename src -> flash_attn	2022-06-01 18:50:26 -07:00