flash-attention

History

Tri Dao 73df3be7d5 Add test for BTLM init		2023-12-25 15:16:27 -08:00
..
test_baichuan.py	Pass alibi slopes to flash_attn_with_kvcache during generation	2023-12-24 20:31:59 -08:00
test_bert.py	Add BigCode converters (#532 )	2023-09-10 17:24:50 -07:00
test_bigcode.py	[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	2023-09-18 15:29:06 -07:00
test_btlm.py	Add test for BTLM init	2023-12-25 15:16:27 -08:00
test_falcon.py	[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	2023-09-18 15:29:06 -07:00
test_gpt_generation_parallel.py	[Llama] Fix some tests, add tests for Llama 2 and CodeLlama	2023-09-20 23:36:46 -07:00
test_gpt_neox.py	Add tests for Pythia, GPT-JT, and RedPajama models	2023-09-13 01:10:39 -07:00
test_gpt_parallel.py	Run isort and black on test files	2023-08-18 20:59:35 -07:00
test_gpt.py	[Gen] Simplify decode_speculative	2023-09-19 22:20:22 -07:00
test_gptj.py	[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	2023-09-18 15:29:06 -07:00
test_llama.py	[Llama] Fix some tests, add tests for Llama 2 and CodeLlama	2023-09-20 23:36:46 -07:00
test_opt.py	[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	2023-09-18 15:29:06 -07:00
test_vit.py	Run isort and black on test files	2023-08-18 20:59:35 -07:00