picotron/README.md

# picotron
In the spirit of [NanoGPT](https://github.com/karpathy/nanoGPT), we created Picotron: The minimalist & most-hackable repository for pre-training Llama-like models with [4D Parallelism](https://arxiv.org/abs/2407.21783) (Data, Tensor, Pipeline, Context parallel). It is designed with simplicity and **educational** purposes in mind, making it an excellent tool for learning and experimentation.

![](assets/banière.png)
- The code itself is simple and readable: `train.py`, `model.py` and `[data|tensor|pipeline|context]_parallel.py` are all under **300** lines of code.

- Performance is not the best but still under active development. We observed 38% MFU on a LLaMA-2-7B model using 64 H100 GPUs and nearly 50% MFU on the SmolLM-1.7B model with 8 H100 GPUs. Benchmarks will come soon

# Install

```
pip install -e .
```

# Quick start
- Get a HF token [here](https://huggingface.co/settings/tokens) to download models from HuggingFace

- GPU
    ```sh
    # To create a config file in json format under tmp by default
    python create_config.py --out_dir tmp --exp_name llama-1B --dp 8 --model_name HuggingFaceTB/SmolLM-1.7B --num_hidden_layers 15  --grad_acc_steps 32 --mbs 4 --seq_len 1024 --hf_token <HF_TOKEN>

    # Locally
    torchrun --nproc_per_node 8 train.py --config tmp/llama-1B/config.json 

    # 3D Parallelism
    python create_config.py --out_dir tmp --dp 4 --tp 2 --pp 2 --pp_engine 1f1b --exp_name llama-7B --model_name meta-llama/Llama-2-7b-hf  --grad_acc_steps 32 --mbs 4 --seq_len 1024 --hf_token <HF_TOKEN>

    # Slurm
    python submit_slurm_jobs.py --inp_dir tmp/llama-7B --qos high --hf_token <HF_TOKEN>
    ```

-  CPU (expect it to be slow)
    ```sh
    # 3D Parallelism on CPU
    python create_config.py --out_dir tmp --exp_name llama-1B-cpu --dp 2 --tp 2 --pp 2 --pp_engine 1f1b --model_name HuggingFaceTB/SmolLM-1.7B --num_hidden_layers 5  --grad_acc_steps 2 --mbs 4 --seq_len 128 --hf_token <HF_TOKEN> --use_cpu

    # Locally
    torchrun --nproc_per_node 8 train.py --config tmp/llama-1B-cpu/config.json
    ```

# Acknowledgements

- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
- [FairScale](https://github.com/facebookresearch/fairscale)
- [LitGPT](https://github.com/Lightning-AI/lit-gpt)
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00			`# picotron`
readme 2024-12-19 14:31:03 +08:00			`In the spirit of [NanoGPT](https://github.com/karpathy/nanoGPT), we created Picotron: The minimalist & most-hackable repository for pre-training Llama-like models with [4D Parallelism](https://arxiv.org/abs/2407.21783) (Data, Tensor, Pipeline, Context parallel). It is designed with simplicity and educational purposes in mind, making it an excellent tool for learning and experimentation.`
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
			`![](assets/banière.png)`
Update README.md 2024-12-19 16:24:14 +08:00			- The code itself is simple and readable: `train.py`, `model.py` and `[data\|tensor\|pipeline\|context]_parallel.py` are all under 300 lines of code.
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
Update README.md 2024-12-19 16:45:01 +08:00			`- Performance is not the best but still under active development. We observed 38% MFU on a LLaMA-2-7B model using 64 H100 GPUs and nearly 50% MFU on the SmolLM-1.7B model with 8 H100 GPUs. Benchmarks will come soon`
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
			`# Install`

			```
			`pip install -e .`
			```

			`# Quick start`
readme 2024-12-19 14:31:03 +08:00			`- Get a HF token [here](https://huggingface.co/settings/tokens) to download models from HuggingFace`
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
			`- GPU`
readme 2024-12-19 14:31:03 +08:00			```sh
			`# To create a config file in json format under tmp by default`
			`python create_config.py --out_dir tmp --exp_name llama-1B --dp 8 --model_name HuggingFaceTB/SmolLM-1.7B --num_hidden_layers 15 --grad_acc_steps 32 --mbs 4 --seq_len 1024 --hf_token <HF_TOKEN>`
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
readme 2024-12-19 14:31:03 +08:00			`# Locally`
			`torchrun --nproc_per_node 8 train.py --config tmp/llama-1B/config.json`
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
readme 2024-12-19 14:31:03 +08:00			`# 3D Parallelism`
			`python create_config.py --out_dir tmp --dp 4 --tp 2 --pp 2 --pp_engine 1f1b --exp_name llama-7B --model_name meta-llama/Llama-2-7b-hf --grad_acc_steps 32 --mbs 4 --seq_len 1024 --hf_token <HF_TOKEN>`
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
readme 2024-12-19 14:31:03 +08:00			`# Slurm`
			`python submit_slurm_jobs.py --inp_dir tmp/llama-7B --qos high --hf_token <HF_TOKEN>`
			```
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
			`- CPU (expect it to be slow)`
readme 2024-12-19 14:31:03 +08:00			```sh
			`# 3D Parallelism on CPU`
			`python create_config.py --out_dir tmp --exp_name llama-1B-cpu --dp 2 --tp 2 --pp 2 --pp_engine 1f1b --model_name HuggingFaceTB/SmolLM-1.7B --num_hidden_layers 5 --grad_acc_steps 2 --mbs 4 --seq_len 128 --hf_token <HF_TOKEN> --use_cpu`
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
readme 2024-12-19 14:31:03 +08:00			`# Locally`
			`torchrun --nproc_per_node 8 train.py --config tmp/llama-1B-cpu/config.json`
			```
fix stuff to make it CPU compliants 2024-12-19 00:50:36 +08:00
			`# Acknowledgements`

Update Readme 2024-12-19 14:04:04 +08:00			`- [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)`
			`- [FairScale](https://github.com/facebookresearch/fairscale)`
Update README.md 2024-12-19 16:24:14 +08:00			`- [LitGPT](https://github.com/Lightning-AI/lit-gpt)`