when using checkpoint_lvl=2, we all_gather_raw(x) without async_op=True. So we don't need to wait for handle. Just skip. |
||
|---|---|---|
| .. | ||
| triton | ||
| __init__.py | ||
| activations.py | ||
| fused_dense.py | ||
| layer_norm.py | ||
| rms_norm.py | ||
when using checkpoint_lvl=2, we all_gather_raw(x) without async_op=True. So we don't need to wait for handle. Just skip. |
||
|---|---|---|
| .. | ||
| triton | ||
| __init__.py | ||
| activations.py | ||
| fused_dense.py | ||
| layer_norm.py | ||
| rms_norm.py | ||