Commit Graph

3 Commits

Author SHA1 Message Date
ferdinand.mom
8e36bbe032 fix multi-node training by using global rank instead of local rank to init process_group 2024-11-03 00:14:14 +00:00
ferdinand.mom
bd6b8a0972 add hf token + fix multi-node training with torchrun args 2024-11-02 02:18:40 +00:00
ferdinand.mom
f74bff79e0 cleaning 2024-10-30 14:58:41 +00:00