flash-attention/training/configs/optimizer/adamw-zero.yaml

# @package train.optimizer
_target_: torch.distributed.optim.ZeroRedundancyOptimizer
_recursive_: True
optimizer_class:
  _target_: torch.optim.__getattribute__
  _args_:
    - "AdamW"
Release training code 2022-11-29 09:31:19 +08:00			`# @package train.optimizer`
			`_target_: torch.distributed.optim.ZeroRedundancyOptimizer`
			`_recursive_: True`
			`optimizer_class:`
			`_target_: torch.optim.__getattribute__`
			`_args_:`
			`- "AdamW"`