flash-attention

History

Zhihao Shen 30e1ef0f79 minify torch.torch.int32 to torch.int32 (#1237 )		2024-09-18 00:32:59 -07:00
..
__init__.py	Add __init__.py files to subdirectories for installation	2022-11-17 16:55:44 -08:00
baichuan.py	Pass alibi slopes to flash_attn_with_kvcache during generation	2023-12-24 20:31:59 -08:00
bert.py	minify torch.torch.int32 to torch.int32 (#1237 )	2024-09-18 00:32:59 -07:00
bigcode.py	Add BigCode converters (#532 )	2023-09-10 17:24:50 -07:00
btlm.py	Implement BTLM model	2023-12-24 20:35:12 -08:00
falcon.py	Run isort and black on python files	2023-08-18 14:22:11 -07:00
gpt_neox.py	[Gen] Remove minor dead code	2023-12-19 22:57:39 -08:00
gpt.py	Fix KeyError handling for non-existing key in state_dict.pop() (#898 )	2024-06-30 22:40:03 -07:00
gptj.py	Run isort and black on python files	2023-08-18 14:22:11 -07:00
llama.py	Fix E1136 (#563 )	2023-09-21 11:48:23 -07:00
opt.py	Run isort and black on python files	2023-08-18 14:22:11 -07:00
vit.py	[LayerNorm] Switch from CUDA to Triton implementation	2024-01-05 00:31:17 -08:00