flash-attention

Author	SHA1	Message	Date
Xuechen Li	0f7853c6a1	enable loading hf llama checkpoints for training (#446 ) * prelim. * add hf convertion fn. * mlp. * change name. * fix bug. * inverse permute. * change comment. * revert style changes. * fix. * add doc. * revert. * enable load safe. * fix safe load. * fix import. * fix typing-related lints. * fix ckpt loading logic. * make single gpu work. * test with parallel. * ckpt format. * enable pretrained state dict. * remove unused imports. * remove unused. * mark idea related.	2023-08-15 08:33:15 -07:00
Tri Dao	184b992dcb	[GPT] Implement parallel LLaMa	2023-07-28 15:52:48 -10:00
Tri Dao	56ccaff126	[GPT] Add LLaMa-13B to test	2023-07-26 07:22:22 -10:00
Tri Dao	8e9820a55b	[Rotary] Fix tests when loading state dict with rotary inv_freqs	2023-07-26 07:16:33 -10:00
Tri Dao	62e9814466	[Rotary] Make sure frequency calculation is in fp32	2023-07-02 16:39:39 -07:00
Tri Dao	a9a4b4e4f2	[LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm	2023-05-04 23:39:43 -07:00
Tri Dao	96d10f6545	Implement LLaMa	2023-04-18 21:51:35 -07:00