Go to file
2024-12-18 15:51:04 +00:00
picotron Revert to @zzhhjjj class naming as it is more expressive 2024-12-17 15:55:18 +00:00
template breaking: add new version of initi meta device but memory leaks 2024-12-17 15:46:16 +00:00
tests Revert to @zzhhjjj class naming as it is more expressive 2024-12-17 15:55:18 +00:00
.gitignore small changes 2024-12-17 05:01:35 +00:00
create_config.py revert to use huggingface cli + hf_transfers (this will not create snapshots/blob folder etc through CLI use) 2024-12-18 15:51:04 +00:00
extract_metrics.py add mfu parsing 2024-12-04 13:08:28 +00:00
README.md Initial commit 2024-09-18 14:01:22 +02:00
requirements.txt use hf_transfer which improve download time by 3 2024-12-18 14:51:14 +00:00
setup.py tesnsor parallel, will clean later 2024-10-18 05:13:44 +00:00
submit_slurm_jobs.py can now load big model through safetensors (sharded and single file) 2024-12-01 19:39:16 +00:00
train.py download safetensors when creating config time. If we do it in training, barrier() may tiemout while waiting for download 2024-12-17 15:46:16 +00:00

picotron