Getting familiar with the shared Unity cluster is the first step in your journey to a succesful project.
Knowledge of the proper usage and etiquets is important for avoiding improper usage that many interfere with other cluster users.
Please carefully read the following instructions and the linked documentation webpages. Happy 696DS!
- You should have obtained your Unity account by now. If not please follow the steps on this page listed under "Accounts for Students". Your PI is
pi_dhruveshpate_umass_edu
. - Connect to the Unity GPU cluster: https://docs.unity.rc.umass.edu/documentation/connecting/ssh/.
- Change to your home directory:
cd ~
. This directory,/home/<username>
, is on a slow HDD and should have 50GB of space. It is sufficient for this assignment. For your project, we will provide separate guidelines about using workspaces on a fast SSD. - Clone this repository and enter it:
git clone https://github.com/dhruvdcoder/696ds-unity-assignment.git cd 696ds-unity-assignment
- Use conda to create an environment for installing relevant packages and libraries for each project.
Unity already has conda installed and you can load it by
module load miniconda/22.11.1-1
. More about conda can be read here: https://docs.unity.rc.umass.edu/documentation/software/conda/. - Create a conda environment for this assignment:
conda create --name 696hw1 python=3.9.7 pip
. You can replace696hw1
by your preferred environment name. - Activate the conda environment:
conda activate 696hw1
. - Install packages in this environment:
pip install -r requirements.txt
.
- Unity consists of login nodes and computation nodes and uses SLURM to manage GPU allocation and job scheduling.
- Always run jobs (model training, testing, etc.) on computation nodes. Never run a JOB on a login node!!! This may prevent others from even logging in.
- There are in general two ways to access GPUs.
- (1) Request an interactive session with GPU access. You will be sent to a computation node and will be able to run bash and python scripts. Once you close your terminal, the GPU access is lost. This is suitable for debugging.
- (2) Submit sbatch jobs to run in the background. Even if you close your terminal, the job will continue till it is complete. This is suitable for submitting multiple training or evaluation runs that will last for multiple hours.
- Please read more details here: https://docs.unity.rc.umass.edu/documentation/jobs/. The linked webpage also has sub-pages.
- Obtain an interactive session with GPU access and run
test/generate_id.py
:followed bysrun --partition gpu --gres=gpu:1 -c 2 --mem=20GB -t 0-01:00:00 --pty /bin/bash
Note down the 5 character unique ID generated. This ID is unique for every student.module load miniconda/22.11.1-1 conda activate 696hw1 python test/generate_id.py
- Activate the conda environment and submit
test/generate_id.py
as ansbatch
job to be run in the background. You may usesbatch generate_id.sh
to do so. Note down the 5 character unique ID generated by this job in the log filegenerate_id-<job_ID>.out
. Learn more about commonly used batch job commands here: https://docs.unity.rc.umass.edu/documentation/jobs/sbatch/. - Paste the two unique IDs in the assignment on Gradescope.
You can add commands into ~/.bashrc
so that they will be automatically run when you ssh into a node on Unity.
```
vim ~/.bashrc
# Add this line to avoid manually loading conda.
module load miniconda/22.11.1-1
# Add this line to avoid typing a long name.
# Instead of cd ~/696ds-unity-assignment,
# you will be able to just type cd $hw1.
export hw1=~/696ds-unity-assignment
# Make the above changes effective.
source ~/.bashrc
```