Tri Dao
|
88173a1aaf
|
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
2023-01-17 18:12:27 -08:00 |
|
Tri Dao
|
ce26d3d73d
|
Bump to v0.2.7
|
2023-01-06 17:37:30 -08:00 |
|
Tri Dao
|
71befc19e1
|
[Loss] Use flash_attn.losses.cross_entropy.CrossEntropyLoss
|
2022-12-31 22:43:28 -08:00 |
|
Tri Dao
|
cadfa396b8
|
[Docker] Set torchmetrics==0.10.3
|
2022-12-30 02:42:28 -08:00 |
|
Tri Dao
|
43798966cf
|
[Docs] Fix formatting
|
2022-12-30 00:01:55 -08:00 |
|
Tri Dao
|
3c7cbfc195
|
[Docs] Mention that dropout_layer_norm supports all dims up to 6k
|
2022-12-29 23:55:33 -08:00 |
|
Tri Dao
|
984d5204e2
|
Update training Dockerfile to use flash-attn==0.2.6
|
2022-12-29 15:12:33 -08:00 |
|
Tri Dao
|
b4018a5028
|
Implement Tensor Parallel for GPT model
|
2022-12-26 16:22:43 -08:00 |
|
Tri Dao
|
dff68c2b22
|
Add smoothing for CrossEntropyParallel, rename to CrossEntropyLoss
|
2022-12-23 14:51:08 -08:00 |
|
Tri Dao
|
c2407dec96
|
Fix typo in config: train.gpu -> train.gpu_mem
|
2022-12-21 13:42:30 -08:00 |
|
Tri Dao
|
4a6eaa9f27
|
Update configs, add results
|
2022-11-29 04:46:43 -08:00 |
|
Tri Dao
|
0bf5e50038
|
Release training code
|
2022-11-28 17:34:40 -08:00 |
|