flash-attention

History

Xuechen Li 0f7853c6a1 enable loading hf llama checkpoints for training (#446 ) * prelim. * add hf convertion fn. * mlp. * change name. * fix bug. * inverse permute. * change comment. * revert style changes. * fix. * add doc. * revert. * enable load safe. * fix safe load. * fix import. * fix typing-related lints. * fix ckpt loading logic. * make single gpu work. * test with parallel. * ckpt format. * enable pretrained state dict. * remove unused imports. * remove unused. * mark idea related.		2023-08-15 08:33:15 -07:00
..
__init__.py	Add __init__.py files to subdirectories for installation	2022-11-17 16:55:44 -08:00
benchmark.py	[Benchmark] Add script to benchmark FlashAttention	2023-07-28 00:26:52 -10:00
distributed.py	[TP] Implement TensorParallel without sequence parallel	2023-01-07 13:45:22 -08:00
generation.py	[Gen] Minor tweak to allocate_inference_cache	2023-04-21 11:56:47 -07:00
pretrained.py	enable loading hf llama checkpoints for training (#446 )	2023-08-15 08:33:15 -07:00