guide / FAQ
Slurm configuration
Most scripts
For most scripts, you need to know how many gpus is needed for your tasks. Then you should use a number of --ntasks-per-node
equal to the number of gpus you need. Then call the python file with srun.
Here is an example with 4 gpus:
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --nodes=1
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100
module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1
srun python example.py
Here is another example with 16 gpus:
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:8
#SBATCH --ntasks-per-node=8
#SBATCH --nodes=2
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100
module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1
srun python example.py
Inference with accelerate
For inference with accelerate, --ntasks-per-node
need to be equal to 1 with any number gpus you'll use. Example with 4 gpus:
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100
module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1
srun python example.py
DDP with accelerate
For ddp with accelerate, --ntasks-per-node
need to be equal to 1 with any number gpus you'll use. You can also use idr_accelerate
to launch your script, it will replace accelerate launch
and create automatically a config file with the slurm parameters. This config file will be save in .accelerate_config
dir. Example with 16 gpus:
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:8
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=2
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100
module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1
srun idr_accelerate example.py
Mini Benchmark
Training
Test on imdb dataset
Optimization | Model | Nb GPUs | Batch Size Global | Batch Size per gpu | Max GPU Memory Allocated | Estimated Epoch time |
---|---|---|---|---|---|---|
DDP | bloom-1b7 | 4 | 4 | 1 | 22.6 GB | 15min 45s |
Accelerate ddp | bloom-1b7 | 4 | 4 | 1 | 22.6 GB | 15min 56s |
Deepspeed Zero3 | bloom-1b7 | 4 | 4 | 1 | 12.3 GB | 34min 25s |
FSDP | bloom-1b7 | 4 | 4 | 1 | 6.0 GB | 13min 24s |
QLoRA | bloom-1b7 | 4 | 1 | 4 | 3.6 GB | 1h 33min 30s |