Python 3.7
pip install -r requirements.txt
python3 train.py -i <input H5 File/Directory of Files to train on> -o <output directory> -t <H5 File/Directory of files to use as test set> -e <num epoch to train for> -c <config to use>
Upon training completion, graphs for the ROC AUC vs Epoch, Loss vs Epoch, Precision vs Epoch, ROC for each tagger, and Confusion Matrix are saved to the output directory, along with a .pt saved model file.
The Float/Unquantized 3 Layer model is models.three_layer_model()
, with the Quantized/Brevitas 3 Layer Model is models.three_layer_model_bv()
. Either can be chosen to train by setting current_model
to one of the two.
At the moment, the precision of models.three_layer_model_bv()
is set by self.weight_precision
within the class in models.py
, though this is likely to change in the future
PRP Nautilus Kubernetes cluster (https://nautilus.optiputer.net/) instructions:
First create the persistent volume claim (PVC):
kubectl create -f pt-jet-class-vol.yml
This is used to store the data and model outputs so they persist after deleting pods and jobs.
To do interactive work:
# create the pod
kubectl create -f pt-jet-class-pod.yml
# login to the pod
kubectl exec -it pt-jet-class-pod bash
In particular, you can populate the PVC with the data:
cd /ptjetclassvol/
mkdir data
wget https://raw.githubusercontent.com/ben-hawks/pytorch-jet-classify/master/jet_data_download.sh
source jet_data_download.sh
To check on running pods:
kubectl get pods
kubectl describe pods pt-jet-class-pod
To delete the pod:
kubectl delete pods pt-jet-class-pod
It also auto-deletes after 6 hours.
To launch a job:
kubectl create -f pt-jet-class-job.yml
To check on running jobs:
kubectl get jobs
You can also get the logs of the running jobs by getting the pod name first through
# get job's pod name
kubectl get pods
kubectl describe jobs pt-jet-class-job
# with pod's name, get logs
kubectl logs pt-jet-class-job-baseline-<random-string>