guide / FAQ

Slurm configuration

Most scripts

For most scripts, you need to know how many gpus is needed for your tasks. Then you should use a number of --ntasks-per-node equal to the number of gpus you need. Then call the python file with srun.
Here is an example with 4 gpus:

#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --nodes=1
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100

module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1

srun python example.py

Here is another example with 16 gpus:

#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:8
#SBATCH --ntasks-per-node=8
#SBATCH --nodes=2
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100

module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1

srun python example.py

Inference with accelerate

For inference with accelerate, --ntasks-per-node need to be equal to 1 with any number gpus you'll use. Example with 4 gpus:

#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100

module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1

srun python example.py

DDP with accelerate

For ddp with accelerate, --ntasks-per-node need to be equal to 1 with any number gpus you'll use. You can also use idr_accelerate to launch your script, it will replace accelerate launch and create automatically a config file with the slurm parameters. This config file will be save in .accelerate_config dir. Example with 16 gpus:

#!/bin/bash
#SBATCH --job-name=example
#SBATCH --output=example.out
#SBATCH --error=example.out
#SBATCH --gres=gpu:8
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=2
#SBATCH --hint=nomultithread
#SBATCH --time=00:50:00
#SBATCH --qos=qos_gpu-dev
#SBATCH --cpus-per-task=8
#SBATCH --account=example@a100
#SBATCH -C a100

module purge
module load cpuarch/amd
module load pytorch-gpu/py3/2.0.1

srun idr_accelerate example.py

Mini Benchmark

Training

Test on imdb dataset

Optimization	Model	Nb GPUs	Batch Size Global	Batch Size per gpu	Max GPU Memory Allocated	Estimated Epoch time
DDP	bloom-1b7	4	4	1	22.6 GB	15min 45s
Accelerate ddp	bloom-1b7	4	4	1	22.6 GB	15min 56s
Deepspeed Zero3	bloom-1b7	4	4	1	12.3 GB	34min 25s
FSDP	bloom-1b7	4	4	1	6.0 GB	13min 24s
QLoRA	bloom-1b7	4	1	4	3.6 GB	1h 33min 30s

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search