Project Link: https://machine-perception.ait.ethz.ch/project4/
cd to the root folder and
Run:
python -u src/train.py --model [model] --n_epochs 2000 --lr 0.0005 --seed 42 --divide_lr_every 400
where [model] can be either:
ConvAttModel
for the GCN with the convolutional 1d attention networkGCNAttModel
for the GCN model with encoder/decoder structure,GCNModel
for the GCN model without an attention network.
Evaluate:
python src/evaluate.py --model_id [model id]
To create a new model, create a new python script in the models
folder. This script has to define a class which inherits from the BaseModel
class found in base_model.py
. You can then select to run train.py
with this newly defined model using the tag --model
followed by the name of the class. If no model arguments are given, the DummyModel
will be selected by default.
The create_model
function in the models module will go through all the classes in all the scripts in the models
folder, so you could technically define two different model classes in one script if necessary.
Visualization of the predictions can be done using evaluation.py
. If ran with only --model_id
provided, it will predict the target sequence of the test data and display 10 randomly picked samples. If --eval_on_val
is provided and its value is 1
, it will evaluate the model on the validation set and display the prediction alongside the ground truth.
The predictions on the test data are in a .csv.gz
format which can be directly uploaded on the submission website. After training, this file will be automatically generated and put in the model folder in your experiment folder along with the saved model parameters and configuration.
-
Create a virtual environment in conda
conda create --name MP python=3.7.4
-
Activate it
conda activate MP
-
Install
pytorch
andtorchvision
conda install pytorch=1.6.0 torchvision=0.7.0
-
Then, install requirements. You might have to comment out torch and torchvision.
conda install --file requirements.txt
-
Add MP_DATA and MP_EXPERIMENTS environment variables. This is how to do it on windows:
conda env config vars set MP_DATA=..\..\project4_data conda env config vars set MP_EXPERIMENTS=..\experiments
-
It should now run
python train.py --model DummyModel
-
Connect to the Leonhard host (with your terminal or with VS Code) with
ssh [ethzusername]@login.leonhard.ethz.ch
-
Clone the project (only for the first time)
git clone [ssh project link from Gitlab]
Warning: it could be necessary to copy the ssh key from Leonhard to Gitlab to have access to the Repository
-
Install the Python GPU Module on Leonhard (see wiki) with:
module load python_gpu/3.7.4
-
Run
export MP_DATA="/cluster/project/infk/hilliges/lectures/mp21/project4"
and
export MP_EXPERIMENTS="$HOME/Appliedscience/experiments"
-
Add these commands to the bashrc file to automatically run them when starting up the cluster:
module load python_gpu/3.7.4 source $HOME/.local/bin/virtualenvwrapper.sh workon "MP21" export MP_DATA="/cluster/project/infk/hilliges/lectures/mp21/project4" export MP_EXPERIMENTS="$HOME/Appliedscience/experiments"
Go to the folder where your train.py
file is located. You can run the following command:
bsub -n 1 -W 4:00 -o [outputname] -J [jobname] -R "rusage[mem=8096, ngpus_excl_p=1]" python -u train.py --model [model_class_name]
You can add other CL arguments like --lr or --n_epochs. Careful: the output file of the job will be saved in the folder you submitted the job from. This means that it will end up in the local git repo on your Leonhard home if you run the above command. Please do not push anything that contains a job output file onto the github as this would make it messy very quickly. Either have some sort of prefix system and add all files that start with said suffix to .gitignore or you can submit the job from outside of your git repo :
bsub -n 1 -W 4:00 -o [outputname] -J [jobname] -R "rusage[mem=8096, ngpus_excl_p=1]" python -u mp_project/src/train.py --model [model_class_name]