flash-attention

Author	SHA1	Message	Date
Tri Dao	f1a73d0740	Run isort and black on python files	2023-08-18 14:22:11 -07:00
Xuechen Li	0f7853c6a1	enable loading hf llama checkpoints for training (#446 ) * prelim. * add hf convertion fn. * mlp. * change name. * fix bug. * inverse permute. * change comment. * revert style changes. * fix. * add doc. * revert. * enable load safe. * fix safe load. * fix import. * fix typing-related lints. * fix ckpt loading logic. * make single gpu work. * test with parallel. * ckpt format. * enable pretrained state dict. * remove unused imports. * remove unused. * mark idea related.	2023-08-15 08:33:15 -07:00
Tri Dao	78b7a1dc18	[OPT] Load fp16 weights on CPU before moving to GPU	2023-01-22 17:01:32 -08:00
Tri Dao	f68d41ec77	[Gen] Add OPT to generation test	2023-01-17 19:59:06 -08:00
Tri Dao	7c2191542a	[Gen] Make generation work with Tensor Parallel	2023-01-15 11:34:27 -08:00
Tri Dao	11be742aa3	[Gen] Test generation with rotary embedding	2023-01-07 14:37:54 -08:00
Tri Dao	c6ecd40a59	Tweak CrossEntropyLoss to take process_group in init	2022-12-27 10:47:43 -08:00