OrangeGrid (OG) | HTCondor Support
Accessing OrangeGrid
To access OrangeGrid, simply make an SSH connection using your NetID and specifying the login node you have been assigned. The example below uses 'its-og-login3.syr.edu'. Refer to the access email you received from Research Computing staff with your node information. The cluster supports connection via CMD, programs like Putty, and the use of SCP and WinSCP.
ssh netid@its-og-login3.syr.edu
Campus Wi-Fi or Off-Campus?
Campus Wi-Fi or off-campus networks cannot access the cluster directly. Users off campus need to connect to campus via Remote Desktop Services (RDS). RDS will provide you with a Windows 10 desktop that has access to your G: drive, campus OneDrive, and the research clusters. Once connected, SSH and SCP are available from the Windows command prompt or you can use Putty and WinSCP which are also installed. Full instructions and details for connection to RDS are available on the RDS home page. Note that Azure VPN is an alternative option, but not available for all users. See the Azure VPN page for more details.
In rare cases where RDS is not an option, the research computing team may provide remote access via a bastion host.
HTCondor Basics
Below are examples of the most common HTCondor commands.
Note that you can add '| less' to most commands so that you can enter a more navigable output using arrow keys. Quit the 'less' viewer by pressing 'q' or search for specific text with '/'
# To check the state of the machines available in the pool. condor_status # To see who is using the pool. condor_userprio # To see the status of jobs are queued in the pool. condor_q # To see the status of jobs for a particular user. condor_q <netid> # To see the status of jobs for a particular user continuously over a specified time interval. watch -n <time/seconds> condor_q <netid> # To submit a job to the pool. condor_submit # To remove a submitted job. condor_rm # Will give you all the options for submitting a job and creating job submission files. man condor_submit
In the output of condor_q, the frst column ID is a unique identification number assigned by condor to a job. The OWNER column gives the user name of the job owner. The column labeled ST gives the job state.
The states that you will normally encounter are:
- I for idle. Your jobs is in Condor’s job queue, but is not currently running.
- R for running. Your job has been assigned to a CPU and is currently executing.
- H for held. There is a problem with your job that requires manual intervention.
- C for completed. Your jobs is fnished and is ready to be removed from the queue.
- X for exiting.
Learn more basics with the HTCondor Quick Start Guide.
HTCondor Generic Instructions
Please note that the documentation in the links above refer to a generic cluster. Some of the examples use something called Condor File Transfer to move data between the head node (where you log in) and the worker node (where the job runs). This is not needed and should not be used on OrangeGrid, omit any lines like:
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
Submitting Jobs
To submit a job in HTCondor, you will generally need to create or have a script file and a job description file, often referred to as a "submit file". Once you have these two files, users can use the 'condor_submit' command to submit the job to the HTCondor system.
Simple Shell Script Example
Below is a basic shell script, called hello.sh in this example, that prints a welcome message and then some information about the host that it is running in.
#!/bin/bash # If any command fails, exit with a non-zero exit code set -e # Print a welcome message /bin/echo "Hello, world!" # Dump some information about the host that we are running on /bin/echo -n "Host name is " /bin/hostnamex /bin/echo -n "Linux kernel version is " uname -r /bin/echo -n "Operating system install is " cat /etc/issue # Exit successfully exit 0
We can run from the command line to see what it prints on the head node. If you run the command '/hello.sh', it will print:
Hello, world! Host name is its-condor-submit Linux kernel version is 2.6.38-10-virtual Operating system install is Ubuntu 11.04 \n \l
Simple Submit File Example
Below is a basic condor submit file using our hello.sh script.
# Which university should be used? The vanilla university can be used to run any regular executable that can run on the cmd. universe = vanilla # Point the submit file to the script you would like to use. Use absolute path if the script is in another location. executable = hello.sh # These next three lines tell Condor that it must transfer the input and output files to and from the computer that will actually execute our job. Condor will automatically transfer back all files that your job creates, but if you require any input files, you must explicitly specify them here. To do so, add 'transfer_input_files = file1,file2,etc. transfer_executable = true should_transfer_files = yes when_to_transfer_output = on_exit_or_evict # The next three lines specify the (relative) paths to the files that will store this information. Notice that we have used two Condor variables in these file names: $(cluster) and $(process). These two variables make up the two parts of the Condor job user ID that we saw earlier when running condor_q. output = hello.$(cluster).$(process).out error = hello.$(cluster).$(process).err log = hello.$(cluster).log # This tells Condor to submit 1 job of this type to the pool. queue 1 # Note: If we add an integer after cluster, we can submit multiple identical jobs to the pool. # Ex. 'queue 10' will submit 10 identical jobs to the pool each with a unique process number.
Now that script and submit file are created, submit the job:
# Submit the job. condor_submit hello.sub # Condor will respond telling you that the job was submitted and give you the cluster number, for example: Submitting job(s). 1 job(s) submitted to cluster <JobID>
Running GPU Jobs on OrangeGrid
Specific nodes within the OrangeGrid pool are equipped with Graphical Processing Units (GPUs). Typical uses include TensorFlow and PyTorch, but other uses and tools are welcome.
To take advantage of GPUs for your jobs, please include the following line within your HTCondor submit files.
+request_gpus = 1
OrangeGrid FAQ
Can I Use Tensorflow with OrangeGrid?
Yes, we have detailed instructions for Installing and Using Tensorflow with HTCondor.
Can I Use Docker with OrangeGrid?
The cluster doesn't support Docker directly, however, you can import Docker containers into Singularity. More info on Singularity is available from here: https://docs.sylabs.io/guides/3.6/user-guide/.
What Software is Available on OrangeGride?
Software installation on the nodes is minimal, primarily due to the variety of software and versions needed by researchers. To assist in getting needed software to the node the research team utilizes containers which allows creating a self-contained environment that includes the software, libraries and configuration needed. The research computing team will assist with creation in collaboration with the researcher.
Getting Help
Question about Research Computing? Any questions about using or acquiring research computing resources or access can be directed at researchcomputing@syr.edu.