SLURM/SGE Cheat Sheet

Equivalences des commandes SLURM/SGE

Commandes utilisateur

Explications Commande Slurm Commande SGE
Interactive login # srun --pty bash # qlogin
# srun -p "part_name" --pty bash
# sdev
Job submission # sbatch [script file] # qsub [script file]
Job deletion # scancel [job_ID] # qdel [job_ID]
Job status all # squeue -all # qstat -f
Job status # squeue [job_ID] # qstat -u \ * [-j job_ID]
Job user status # squeue -u [user name] # qstat [-u user name]
Job hold # scontrol hold [job_ID] # qhold [job_ID]
Job release # scontrol release [job_ID] # qrls [job_ID]
Queue list # squeue # qconf -sql
Node list # sinfo -N # qhost
# scontrol show nodes # qhost
Clusterstatus # sinfo # qhost -q
GUI # sview # qmon

Commandes admin

Explications Commande Slurm Commande SGE
Version # sinfo --version # qstat -help
Désactiver un noeud # scontrol update nodename=<node> state=draining # qmod -d <queue>@<noeuds>
dans toutes les queues # scontrol update --all state=draining (?) # qmod -d \*@<noeuds>
Activer un noeud # scontrol update nodename=<node> state=resume # qmod -e <queue>@<noeuds>
dans toutes les queues # scontrol update --all state=resume (?) # qmod -e \*@<node>

Variables d’environnement

Explications Variable Slurm Variable SGE
job_ID $SLURM_JOBID $JOB_ID
Submit Directory $SLURM_SUBMIT_DIR $SGE_O_WORKDIR
Submit Host $SLURM_SUBMIT_HOST $SGE_O_HOST
Node List $SLURM_JOB_NODELIST $PE_HOSTFILE
Job Array Index $SLURM_ARRAY_TASK_ID $SGE_TASK_ID

Paramètres des scripts des jobs

Explications Paramètre Slurm Paramètre SGE
Script directive #SBATCH #$
Queue -p [queue] -q [queue]
Node Count -N [min[-max]] X
CPU count -n [count] -pe [PE] [count]
Wall Clock Limit -t [min]] -l h_rt=[seconds]
-t [days-hh:mm:ss] -l h_rt=[seconds]
Standard Output -o [file name] -o [file name]
Standard Error -e [file name] -e [file name]
Error File stdout/err -o [file name] -j yes
Copy Environment --export=[ALL/NONE/var] -V
Event Notification --mail-type=[events] -m abe
EmailAddress --mail-user=[address] -M [address]
Job Name --job-name=[name] -N [name]
Job Restart --requeue -r [yes/no]
--no-requeue -r [yes/no]
Working Directory --workdir=[dir_name] -wd [dir_name]
Resource Sharing --exclusive -l exclusive
--shared -l exclusive
Memory Size --mem=[mem][M/G/T] -l mem_free=[memory][K/M/G]
--mem-per-cpu=[mem][M/G/T] -l mem_free=[memory][K/M/G]
Account to charge --account=[account] -A [account]
Tasks Per Node --tasks-per-node=[count] (Fixed allocation_rule in PE)
CPUs Per Task --cpus-per-task=[count] X
Job Dependency --depend=[state:job_ID] -holdjid [job_ID/job_NAME]
Job Project --wckey=[name] -P [name]
Job host preference --nodelist=[nodes] -q [queue]@[node]
option : -exclude=[nodes] -q [queue]@@[hostgroup]
Quality of Service --qos=[name] X
Job Arrays --array=[array_spec] -t [array_spec]
Generic Resources --gres= [resource_ spec I -l [resource]=[value]
Licenses --licenses=[license_spec] -l [license]=[count]
Begin Time --begin=YY-MM-DD[HH:MM:SS] -a [YYMMDDhhmm]

Exemples

Scripts

Single-core application

Note : dans slurm on ne devrait pas utiliser + de 3G de mem

Script Slurm single-core Script SGE single-core
#!/bin/bash -l (NOTE the -l flag)
#
#
#SBATCH -J test
#SBATCH -e test.output
#SBATCH -o test.output
# Default in slurm
#SBATCH --mail-user [email protected]
#SBATCH --mail-type=ALL
# Request 5 hours run time
#SBATCH -t 5:0:0
#SBATCH -A your_project_id_here
#
#SBATCH -p core -n 1
#
//call your app here
#!/bin/bash
#
#
#$ -N test
#$ -j y
#$ -o test.output
#$ -cwd
#$ -M [email protected]
#$ -m bea
# Request 5 hours run time
#$ -l h_rt=5:0:0
#$ -P your_project_id_here
#
#$ -l mem=4G
#
//call your app here

MPI application

Script Slurm Script SGE
#!/bin/bash -l
# NOTE the -l flag!
#
#SBATCH -J test
#SBATCH -o test.output
#SBATCH -e test.output
# Default in slurm
#SBATCH --mail-user [email protected]
#SBATCH --mail-type=ALL
# Request 5 hours run time
#SBATCH -t 5:0:0
#SBATCH -A your_project_id_here
#SBATCH --mem=4000
#SBATCH -p normal
#
//call your app here
#!/bin/bash
#
#
#$ -N test
#$ -j y
#$ -o test.output
#$ -cwd
#$ -M [email protected]
#$ -m bea
# Request 5 hours run time
#$ -l h_rt=5:0:0
#$ -P your_project_id_here
#
#$ -l mem=4G
#
//call your app here

Documentation

https://docs.hpc.shef.ac.uk/en/latest/referenceinfo/scheduler/SGE/sge_environment_variables.html
https://www.uppmax.uu.se/support/user-guides/sge-vs-slurm-comparison/

PDF : SGEtoSLURMconversion.pdf
PDF : scheduler_commands_cheatsheet-2020-ally.pdf

> Partager <