Resources allocation and job submission

To run a job, computational resources for this particular job must be allocated. This is done via the Slurm workload manager, which distributes workloads across the cluster.

If you are used to some other workload manager (PBS/Torque, LSF, SGE, LoadLeveler), you can use this guide for mapping your fammiliar commands to the Slurm environment.

Creating a job script

Warning

This is only very basic example. Please see the Job Examples for more more realistic template for your scripts.

To run a job you need to create a job script. A job script is is a Bash file with Slurm directives at the beginning, specifying the number of CPUs, memory, etc., needed for your job.

Let's say that you are interested in runing a Python job with 4 CPU cores, 2 GPUs and 32GB RAM. The running time should be limited to 0 days, 20 hours, 15 minutes and 45 seconds. Sample file template might look like this:

#!/bin/bash
#SBATCH --job-name=sample_job
#SBATCH --output=sample_job.out
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:2
#SBATCH --mem=32G
#SBATCH --time=0-20:15:45
module purge
module load CUDA/9.1.85
module load cuDNN/7.0.5-CUDA-9.1.85
module load Anaconda3/5.0.1
nvidia-smi
python main.py

Next you can submit your job via sbatch run.sh (assuming your script is named run.sh). This will submit and execute job in batch mode (i.e. non-interactive). Its outputs will be written to the sample_job.out file.

In case that you will need an interactive job, use srun command. Let's say that you need 2 CPU cores, 1 GPU and 16GB RAM. You do not intent to run your interactive session for more than 4 hours and 30 minutes. Then the command might look like this:

srun --cpus-per-task=2 --gres=gpu:1 --mem=16G --time=4:30:00 --partition gpu --pty bash

If you do not need to use a GPU, please use the compute partition:

srun --cpus-per-task=2 --mem=16G --time=1:00:00 --pty bash