Zest | Slurm Support
Zest is the Syracuse University researching computing high-performance computing (HPC) cluster. Zest is a non-interactive Linux environment intended to run analyses that require extensive parallelism or running for extended durations.
Looking for OrangeGrid? While similar, the Zest and OrangeGrid clusters are unique environments. Information about OrangeGrid is available on the OrangeGrid (OG) | HTCondor Support home page.
On This Page
Accessing Zest
To access Zest, simply make an SSH connection using your NetID and specifying the login node you have been assigned. Refer to the access email you received from Research Computing staff with your login node number. The cluster supports connection via CMD, programs like Putty, and the use of SCP and WinSCP.
ssh netid@its-zest-loginX.syr.edu
Campus Wi-Fi or Off Campus?
Campus Wi-Fi and off-campus networks cannot access the cluster directly. Users off campus need to connect to campus via Remote Desktop Services (RDS). RDS will provide you with a Windows 10 desktop that has access to your G: drive, campus OneDrive, and the research clusters. Once connected, SSH and SCP are available from the Windows command prompt or you can use Putty and WinSCP which are also installed. Full instructions and details for connection to RDS are available on the RDS home page. Note that Azure VPN is an alternative option, but not available for all users. See the Azure VPN page for more details.
In rare cases where RDS is not an option, the research computing team may provide remote access via a bastion host.
Zest | Slurm Commands & Cluster Info
Slurm Commands
Once connected, below are some basic commands to get started.
# Show node information. User this to view available nodes and resources. sinfo # Show job queue. squeue # Display job accounting information. sacct # Submit job script. sbatch [script name] # Start an interactive job. salloc [options][command] # Launch non-MPI parallel job steps, usually runs in a SBATCH script. srun [options][command] # Launch MPI parallel job steps, usually runs in a SBATCH script (Be sure to load associated modules). mpirun [options][command] # Display running job status. sstat [jobid] # Cancel a job. scancel [jobid]
Learn more basics with the Slurm Quick Start User Guide.
Zest Cluster Local Storage
Note the default local storage locations.
Resource | Description |
---|---|
/home/NetID/ | NFS based user home directory available throughout the cluster |
/tmp/ | Temporary fast local storage only persistent for the current job |
Lmod Commands
Lmod is also available on the Zest clusters, examples below.
# Show all available modules. module avail # Load the module environment. module load [name] # Search for Module names matching string. modulespider [string] # Search module name or description. module keyword [string] # List currently loaded modules. module list # Unload a module from environment. module unload [name] # Remove all modules. module purge # Save currently loaded modules to collection name. module save [name] # Shows all saved collections. module savelist # Restore modules from collection name. module restore [name] # Display all Lmod options. module help
Zest Cluster Partitions
Note the Zest cluster has multiple partitions including ones configured for CPU-intensive work, GPU utilization, and for those needing longer runtimes. Users can submit jobs to any partitions they feel meet their requirements. Below is a list of currently available partitions.
Partition | General Purpose | Max Runtime (Days) |
---|---|---|
normal (default) | Designed for CPU-intensive workloads. | 20 |
compute_zone2 | Designed for CPU-intensive workloads. | 20 |
longjobs | Designed for CPU-intensive workloads that require extended runtimes. | 40 |
gpu | Tailored for GPU-heavy computations. | 20 |
gpu_zone2 | Tailored for GPU-heavy computations. | 20 |
If no partitions are specified, the default partition will be used. The current default is the 'normal' partition. If one or more partitions are specified, only those will be considered to run that job.
To point a jobs to particular partitions, simply add them either individually or as a list to the submission file, examples below.
# Submit a job a single partition. #SBATCH --partition=computer_zone2 # Submit a job to the CPU partitions. #SBATCH --partition=computer_zone2,normal # Submit a job to the GPU partitions. #SBATCH --partition=gpu_zone2,gpu
Submitting Jobs (with SBATCH Examples)
Submitting jobs on the Zest cluster requires the creation of an SBATCH script. Below are common examples including the use of MPI and GPUs.
Basic SBATCH Example
Below is a basic SBATCH example.
#!/bin/bash # #SBATCH --nodes=1 #SBATCH --ntasks=3 #SBATCH --cpus-per-task=1 #SBATCH --mail-type=ALL #SBATCH --mail-user=netid@syr.edu # replace netid with your NetID # # This runs hostname three times (tasks) on a single node # srun hostname
Assuming the above is 'job1.sh', use the 'sbatch' command to submit the job as seen below.
netid@its-zest-login1:[~]$ sbatch job1.sh Submitted batch job 781 netid@its-zest-login1:[~]$ more slurm-781.out node1002 node1002 node1002 netid@its-zest-login1:[~]$
Note that the default output for jobs will be located in slurm-{jobid}.out.
MPI SBATCH Example
Use mpirun for MPI. Note that you'll want to ensure you have the necessary modules loaded, either directly in the SBATCH file or in your ~./bash_profile. If a script requires a module for compiling, ensure it is loaded prior to compiling.
#!/bin/bash # #SBATCH --nodes=3 #SBATCH --ntasks-per-node=20 #SBATCH --cpus-per-task=1 #SBATCH --mail-type=ALL #SBATCH --mail-user=netid@syr.edu # # Load required modules, run 'module avail' for latest module load imb # Loads Intel InfiniBand Benchmarks module load openmpi4/4.1.6 # Load the openmpi4 module mpirun IMB-MPI1
Example GPU worker interactive job
Use interactive shell allocation to compile and test applications
# This will start an interactive shell on a Supermicro GPU system # using 20 CPUs and 2 GPUs. If the resource is open, you’ll get a shell on # a worker node. Otherwise, srun will hang until resource is available. netid@its-zest-login1:[~]$ srun --pty -p geforce -c 20 --gres=gpu:2 bash [netid@node1024 ~]$ cp -Rp /usr/local/cuda/samples CUDA_SAMPLES [netid@node1024 ~]$ cd CUDA_SAMPLES [netid@node1024 ~/CUDA_SAMPLES]$ make -j20 all [netid@node1024 ~/CUDA_SAMPLES]$ exit netid@its-zest-login1:[~]$
GPU SBATCH Example
Ensure you are using the gpu supported slurm partitions, gpu and gpu_zone2.
#!/bin/bash # #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=10 #SBATCH --partition=gpu_zone2,gpu #SBATCH --gres=gpu:2 #SBATCH --mail-type=ALL #SBATCH --mail-user=netid@syr.edu # nvidia-smi
GPU SBATCH Using MPI Example
Note the use of mpirun with the MPI path.
#!/bin/bash # #SBATCH --nodes=3 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=10 #SBATCH --partition=gpu_zone2,gpu #SBATCH --gres=gpu:2 #SBATCH --mail-type=ALL #SBATCH --mail-user=netid@syr.edu # module load cuda module load openmpi4/4.1.6 # Load the openmpi4 module mpirun /home/netid/CUDA_SAMPLES/bin/x86_64/linux/release/simpleMPI
Advanced SBATCH Commands and Examples
SBATCH files can include some additional parameters depending on your computational needs. Below are some of the more common advanced parameters as well as an example SBATCH file.
Zest FAQ
Can I Use Docker with Zest?
The cluster doesn't support Docker directly, however, you can import Docker containers into Singularity. More info on Singularity is available from here: https://docs.sylabs.io/guides/3.6/user-guide/.
What packages are available on the login and worker nodes?
What Lmod modules are available on the login and worker nodes?
Additional Research Computing Resources
ITS Remote Desktop Services (RDS)
OrangeGrid/HTCondor Support Home Page
Research Computing Events and Colloquia
Getting Help
Question about Research Computing? Any questions about using or acquiring research computing resources or access can be directed at researchcomputing@syr.edu.