Go to file
2024-11-04 15:56:31 +01:00
src some dp renaming 2024-11-04 14:48:12 +00:00
template set num workers to 1 for now to avoid os memory error 2024-11-04 14:39:52 +00:00
.gitignore tesnsor parallel, will clean later 2024-10-18 05:13:44 +00:00
convert_hf_to_picotron.py various fix (modeling, dataloader, cpu load) 2024-10-18 14:33:46 +00:00
convert_picotron_to_hf.py refactor organisation 2024-10-10 15:12:14 +00:00
create_config.py add fuse adam 2024-11-04 14:35:36 +00:00
generate.py various fix (modeling, dataloader, cpu load) 2024-10-18 14:33:46 +00:00
model.py fix spliting input twice for context parallel (done in dataloader) 2024-10-30 15:43:42 +00:00
README.md Initial commit 2024-09-18 14:01:22 +02:00
requirements.txt fix requirements to avoid drop in throughput 2024-11-04 14:33:07 +00:00
setup.py tesnsor parallel, will clean later 2024-10-18 05:13:44 +00:00
submit_slurm_jobs.py add option for HF token 2024-11-04 14:39:12 +00:00
train.py Merge branch 'main' into add-grad-acc-pp 2024-11-04 15:56:31 +01:00
utils.py fix multi-node training by using global rank instead of local rank to init process_group 2024-11-03 00:14:14 +00:00

picotron