guide / FAQ

Slurm configuration

Most scripts

For most scripts, you need to know how many gpus is needed for your tasks. Then you should use a number of --ntasks-per-node equal to the number of gpus you need. Then call the python file with srun.
Here is an example with 4 gpus:

#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --nodes=1
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100

module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1

srun python example.py

Here is another example with 16 gpus:

#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:8
#SBATCH --ntasks-per-node=8
#SBATCH --nodes=2
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100

module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1

srun python example.py

Inference with accelerate

For inference with accelerate, --ntasks-per-node need to be equal to 1 with any number gpus you'll use. Example with 4 gpus:

#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100

module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1

srun python example.py

DDP with accelerate

For ddp with accelerate, --ntasks-per-node need to be equal to 1 with any number gpus you'll use. You can also use idr_accelerate to launch your script, it will replace accelerate launch and create automatically a config file with the slurm parameters. This config file will be save in .accelerate_config dir. Example with 16 gpus:

#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:8
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=2
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100

module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1

srun idr_accelerate example.py

Mini Benchmark

Training

Test on imdb dataset

Optimization Model Nb GPUs Batch Size Global Batch Size per gpu Max GPU Memory Allocated Estimated Epoch time
DDP bloom-1b7 4 4 1 22.6 GB 15min 45s
Accelerate ddp bloom-1b7 4 4 1 22.6 GB 15min 56s
Deepspeed Zero3 bloom-1b7 4 4 1 12.3 GB 34min 25s
FSDP bloom-1b7 4 4 1 6.0 GB 13min 24s
QLoRA bloom-1b7 4 1 4 3.6 GB 1h 33min 30s