Commit Graph

10 Commits

Author SHA1 Message Date
Tri Dao
71befc19e1 [Loss] Use flash_attn.losses.cross_entropy.CrossEntropyLoss 2022-12-31 22:43:28 -08:00
Tri Dao
cadfa396b8 [Docker] Set torchmetrics==0.10.3 2022-12-30 02:42:28 -08:00
Tri Dao
43798966cf [Docs] Fix formatting 2022-12-30 00:01:55 -08:00
Tri Dao
3c7cbfc195 [Docs] Mention that dropout_layer_norm supports all dims up to 6k 2022-12-29 23:55:33 -08:00
Tri Dao
984d5204e2 Update training Dockerfile to use flash-attn==0.2.6 2022-12-29 15:12:33 -08:00
Tri Dao
b4018a5028 Implement Tensor Parallel for GPT model 2022-12-26 16:22:43 -08:00
Tri Dao
dff68c2b22 Add smoothing for CrossEntropyParallel, rename to CrossEntropyLoss 2022-12-23 14:51:08 -08:00
Tri Dao
c2407dec96 Fix typo in config: train.gpu -> train.gpu_mem 2022-12-21 13:42:30 -08:00
Tri Dao
4a6eaa9f27 Update configs, add results 2022-11-29 04:46:43 -08:00
Tri Dao
0bf5e50038 Release training code 2022-11-28 17:34:40 -08:00