flash-attention

Author	SHA1	Message	Date
Tri Dao	0705d2718d	[Llama] Fix some tests, add tests for Llama 2 and CodeLlama	2023-09-20 23:36:46 -07:00
Kevin Hu	42832575d4	Fix Llama GQA/MQA (#546 ) * Fix llama MQA * Fix permute shape * Update llama.py	2023-09-19 22:15:59 -07:00
Tri Dao	dfe29f5e2b	[Gen] Don't use ft_attention, use flash_attn_with_kvcache instead	2023-09-18 15:29:06 -07:00
Tri Dao	8a733cbd53	[Gen] Fix calling update_graph_cache in tests	2023-09-10 17:22:37 -07:00
Tri Dao	913922cac5	[Gen] Refactor decoding function	2023-09-04 17:01:38 -07:00
Tri Dao	0e8c46ae08	Run isort and black on test files	2023-08-18 20:59:35 -07:00
Xuechen Li	7fcd3e6a04	map custom model state_dict back to huggingface format (#465 ) * fix name. * set inv function. * add map back function. * handle gqa. * add type annotation to avoid confusion. * fix docstr. * test inverse remap logic.	2023-08-18 20:51:39 -07:00
Xuechen Li	bb4cded17b	support when num_heads is not divisible by world_size; resolves #459 (#461 ) * uneql rank. * trim. * enable passing in number of heads for each rank. * simplify. * simplify. * cleanup. * fix col parallel. * fix bug with row parallel. * fit out proj. * refac. * fix sharding logic. * refac sharding. * refac. * support multiple of. * make fn reuseable. * fix bug in dimensions. * scaffold. * test uneven heads. * fix test by adding barrier. * refac. * reuse code. * clean up.	2023-08-18 14:10:35 -07:00
Xuechen Li	0f7853c6a1	enable loading hf llama checkpoints for training (#446 ) * prelim. * add hf convertion fn. * mlp. * change name. * fix bug. * inverse permute. * change comment. * revert style changes. * fix. * add doc. * revert. * enable load safe. * fix safe load. * fix import. * fix typing-related lints. * fix ckpt loading logic. * make single gpu work. * test with parallel. * ckpt format. * enable pretrained state dict. * remove unused imports. * remove unused. * mark idea related.	2023-08-15 08:33:15 -07:00
Tri Dao	184b992dcb	[GPT] Implement parallel LLaMa	2023-07-28 15:52:48 -10:00
Tri Dao	56ccaff126	[GPT] Add LLaMa-13B to test	2023-07-26 07:22:22 -10:00
Tri Dao	8e9820a55b	[Rotary] Fix tests when loading state dict with rotary inv_freqs	2023-07-26 07:16:33 -10:00
Tri Dao	62e9814466	[Rotary] Make sure frequency calculation is in fp32	2023-07-02 16:39:39 -07:00
Tri Dao	a9a4b4e4f2	[LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm	2023-05-04 23:39:43 -07:00
Tri Dao	96d10f6545	Implement LLaMa	2023-04-18 21:51:35 -07:00

15 Commits