Tri Dao
|
bd82d6c6eb
|
Revert "[LayerNorm] Don't store x + residual if we don't need gradients"
This reverts commit 800401847e.
|
2024-08-15 12:02:39 -07:00 |
|
Tri Dao
|
800401847e
|
[LayerNorm] Don't store x + residual if we don't need gradients
|
2024-08-15 11:08:46 -07:00 |
|
Tri Dao
|
36587c01cb
|
[LayerNorm] Update layer_norm_linear
|
2024-03-18 23:15:33 -07:00 |
|
Tri Dao
|
bdcae547c7
|
[LayerNorm] Don't exit early in the backward pass (fix #781)
|
2024-01-22 22:40:06 -08:00 |
|
Tri Dao
|
c9861a032d
|
[LayerNorm] Initialize mean and rstd tensor using x.device
|
2024-01-09 16:30:31 -08:00 |
|
Tri Dao
|
f5b308e258
|
[LayerNorm] Rename layernorm.py -> layer_norm.py
|
2024-01-05 00:21:03 -08:00 |
|