ferdinand.mom
|
1dbe034d57
|
better config creation
|
2024-10-30 14:58:41 +00:00 |
|
ferdinand.mom
|
47c00be8c7
|
breaking: add slurm stuff
|
2024-10-29 15:44:35 +00:00 |
|
zzhhjjj
|
b7f3e253be
|
add context parallel
|
2024-10-29 13:42:38 +00:00 |
|
zzhhjjj
|
6220892716
|
refactor
|
2024-10-28 20:44:15 +00:00 |
|
zzhhjjj
|
2f8c87f4d1
|
save/load weights
|
2024-10-28 05:19:59 +00:00 |
|
zzhhjjj
|
63307c79a1
|
add some logs, refactor dataloader
|
2024-10-23 00:38:27 +00:00 |
|
zzhhjjj
|
24ff8d05fd
|
add DDP
|
2024-10-16 16:48:55 +00:00 |
|
ferdinand.mom
|
1e229cae88
|
renaming
|
2024-10-14 09:26:31 +00:00 |
|
ferdinand.mom
|
3095ff4d4f
|
refactor organisation
|
2024-10-10 15:12:14 +00:00 |
|
ferdinand.mom
|
31b5fb9efc
|
ugly ass display of grid (to be changed)
|
2024-09-26 13:45:53 +00:00 |
|
ferdinand.mom
|
b8065de7aa
|
support CPU training through gloo backend
|
2024-09-26 10:27:20 +00:00 |
|
ferdinand.mom
|
b2e276d3b8
|
rename parallel_context to process_group_manager
|
2024-09-25 13:33:20 +00:00 |
|
ferdinand.mom
|
7ba1383ebb
|
fixing socket bug by using dist.new_subgroups_by_enumeration instead
|
2024-09-24 13:43:22 +00:00 |
|
ferdinand.mom
|
c36d415b47
|
add training and generate for pp
|
2024-09-19 14:06:46 +00:00 |
|