This assumes you have CUDA version 10.1 on your GPU device, although you can also train (very slowly) on a CPU
To train with kubernetes:
- cd into /train
- generate kubernetes yaml files
- open generate_train_yamls.sh and set the desired run number (we did 5 runs of each method for the paper)
- select where the training will be run (lis or rrg)
- run generate_train_yamls.sh to generate the yaml files
- run the experiment with: kubectl apply -f relevant_yaml_exp_file
- for active learning these will be in train/yaml_files run
- stop pod with: kubectl delete -f relevant_yaml_exp_file
To train on Google Cloud VM
- spin up a VM using the command in create-vm (use a distinct INSTANCE_NAME)
- ssh into VM using command in gcp_setup
- in VM: git clone https://github.com/carismoses/stacking_experiments.git
- run with the relevant SPECIFIC_ENV_FILE.list: docker run --runtime nvidia --env-file stacking_experiments/train/env_var_files/SPECIFIC_ENV_FILE.list carismoses/stacking-train:latest other optional args: --env CUDA_VISIBLE_DEVICES=$DEVICE_NUM --name $NAME
- To kill a VM instance:
To evaluate with kubernetes:
- cd into /evaluate
- generate kubernetes yaml files
- open generate_eval_yamls.sh
- in the exp_path list variable, list all of the experiment directories to be evaluated (the name of the directory in minio)
- kubectl apply -f yaml_files/relevant_yaml_exp_file
- stop pod with: kubectl delete -f yaml_files/relevant_yaml_exp_file
To run an experiment with docker:
- set env variables in relevant .sh file (train/train.sh or evaluate/eval.sh)
- rebuild docker image (train/Dockerfile or evaluate/Dockerfile)
- run (local) or push and run (remote) NOTE: the bash files are set up to copy results to minio, you might have to comment this out
Experiment Naming strategy: random, bald, subtower, subtower-greedy sampler: sequential, random model_type: fcgn-fc, fcgn, lstm run: 0, 1, 2, 3, 4 filename: strategy-sampler-model-type-run.yaml
NOTE:
- The base image (carismoses/stacking:latest) used in train/Dockerfile and evaluate/Dockerfile are up to date and pushed to dockerhub, so you should only need to make changes to train/Dockerfile and evaluating/Dockerfile
- You will probably only need to make changes in train.sh and eval.sh. Then rebuild the relevant Dockerfile (and push to your own Dockerhub if running remotely)
docker build --tag=carismoses/stacking:latest .
docker build --tag=carismoses/stacking-train:latest .
docker build --tag=carismoses/stacking-eval:latest .
docker push relevant_image_tag
docker pull relevant_image_tag
gpu: docker run --runtime nvidia relevant_image_tag
cpu: docker run relevant_image_tag
gpu: docker run -it --runtime nvidia relevant_image_tag /bin/bash
cpu: docker run -it relevant_image_tag /bin/bash