flash-attention

History

Volodymyr Kyrylov 70ab266a56 rotary: update cos/sin cache when switching from inference mode This resolves RuntimeErrors after running evaluation in inference mode: ``` File "/home/proger/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/proger/.local/lib/python3.10/site-packages/flash_attn/modules/mha.py", line 492, in forward qkv = self.rotary_emb(qkv) File "/home/proger/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, *kwargs) File "/home/proger/.local/lib/python3.10/site-packages/flash_attn/layers/rotary.py", line 229, in forward return apply_rotary_emb_qkv_( File "/home/proger/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, **kwargs) # type: ignore[misc] RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd. ```		2023-07-08 12:01:07 +02:00
..
__init__.py	Add __init__.py files to subdirectories for installation	2022-11-17 16:55:44 -08:00
patch_embed.py	Simplify FusedDense	2022-12-22 21:25:31 -08:00
rotary.py	rotary: update cos/sin cache when switching from inference mode	2023-07-08 12:01:07 +02:00