Commit Graph

5 Commits

Author SHA1 Message Date
ferdinand.mom
8c155f47ce all reduce gradient across DP & CP ranks 2024-09-26 14:00:06 +00:00
ferdinand.mom
b8065de7aa support CPU training through gloo backend 2024-09-26 10:27:20 +00:00
ferdinand.mom
b2e276d3b8 rename parallel_context to process_group_manager 2024-09-25 13:33:20 +00:00
ferdinand.mom
9e9ef8236e refactor to decouple pp training with normal training 2024-09-25 13:17:05 +00:00
ferdinand.mom
e2c0747fe3 add naive DP 2024-09-25 12:36:22 +00:00