Commit Graph

16 Commits

Author SHA1 Message Date
ferdinand.mom
8af19d0caa picotron top level folder 2024-11-04 15:29:26 +00:00
ferdinand.mom
41f49bb15f rename to grad_steps 2024-11-04 15:06:29 +00:00
ferdinand.mom
1dbe034d57 better config creation 2024-10-30 14:58:41 +00:00
ferdinand.mom
47c00be8c7 breaking: add slurm stuff 2024-10-29 15:44:35 +00:00
zzhhjjj
b7f3e253be add context parallel 2024-10-29 13:42:38 +00:00
zzhhjjj
6220892716 refactor 2024-10-28 20:44:15 +00:00
zzhhjjj
2f8c87f4d1 save/load weights 2024-10-28 05:19:59 +00:00
zzhhjjj
63307c79a1 add some logs, refactor dataloader 2024-10-23 00:38:27 +00:00
zzhhjjj
24ff8d05fd add DDP 2024-10-16 16:48:55 +00:00
ferdinand.mom
1e229cae88 renaming 2024-10-14 09:26:31 +00:00
ferdinand.mom
3095ff4d4f refactor organisation 2024-10-10 15:12:14 +00:00
ferdinand.mom
31b5fb9efc ugly ass display of grid (to be changed) 2024-09-26 13:45:53 +00:00
ferdinand.mom
b8065de7aa support CPU training through gloo backend 2024-09-26 10:27:20 +00:00
ferdinand.mom
b2e276d3b8 rename parallel_context to process_group_manager 2024-09-25 13:33:20 +00:00
ferdinand.mom
7ba1383ebb fixing socket bug by using dist.new_subgroups_by_enumeration instead 2024-09-24 13:43:22 +00:00
ferdinand.mom
c36d415b47 add training and generate for pp 2024-09-19 14:06:46 +00:00