squall/picotron

Go to file

ferdinand.mom b390a0101e add mfu parsing		2024-12-04 13:08:28 +00:00
picotron	remove clone() in tp communications as torch.compile will optimize this out anyway	2024-12-03 16:26:41 +00:00
template	raise Exception when not enough layers to distributed in rank + rename variable	2024-12-03 13:17:52 +00:00
.gitignore	picotron top level folder	2024-11-04 15:29:26 +00:00
create_config.py	raise Exception when not enough layers to distributed in rank + rename variable	2024-12-03 13:17:52 +00:00
extract_metrics.py	add mfu parsing	2024-12-04 13:08:28 +00:00
README.md	Initial commit	2024-09-18 14:01:22 +02:00
requirements.txt	fix requirements to avoid drop in throughput	2024-11-04 14:33:07 +00:00
setup.py	tesnsor parallel, will clean later	2024-10-18 05:13:44 +00:00
submit_slurm_jobs.py	can now load big model through safetensors (sharded and single file)	2024-12-01 19:39:16 +00:00
train.py	broadcast tokenizer to every rank as well	2024-12-03 14:20:44 +00:00

README.md

picotron