flash-attention

History

Kevin Hu 42832575d4 Fix Llama GQA/MQA (#546 ) * Fix llama MQA * Fix permute shape * Update llama.py		2023-09-19 22:15:59 -07:00
..
layers	Run isort and black on test files	2023-08-18 20:59:35 -07:00
losses	[CE] Implement CrossEntropyLoss in Triton	2023-09-15 20:05:28 -07:00
models	Fix Llama GQA/MQA (#546 )	2023-09-19 22:15:59 -07:00
modules	Run isort and black on test files	2023-08-18 20:59:35 -07:00
ops	Run isort and black on test files	2023-08-18 20:59:35 -07:00
pyproject.toml	Move pyproject.toml to flash-attn and tests dir to avoid PEP 517	2023-08-25 15:05:28 -07:00
test_flash_attn.py	Swap seqlen_q, nheads for MQA when seqlen_q=1 for fwd (h/t Daniel H)	2023-09-18 14:52:16 -07:00
test_rotary.py	[Rotary] Implement varlen rotary	2023-09-03 17:57:10 -07:00