Please make sure everything is under Python3.
We recommend two ways of getting started to setup the necessary environment:
- Using conda (miniconda3) (preferred).
- Using python virtual environment.
Goto the above link to install the miniconda3 corresponding to your OS.
Next do the followings:
# cs188 is the name of the conda environment, you can name it anything you like.
conda create -n cs188 python==3.8
# You can list all the conda envs using the following command.
conda info --envs
# Activate the conda env.
conda activate cs188
# Initialize.
conda init
# Install the pip in this conda environment.
conda install pip
# And use the following to deactivate the conda environment
# if you're done with some jobs and wish to log out.
conda deactivate
And then install all the required packages simply by:
pip install --upgrade pip
# Install PyTorch, the --no-cache-dir allows you to install on a machine with small RAM
pip3 --no-cache-dir install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
# The below command will bulk install everything needed.
pip install -r requirements.txt
This is a nice guide about setting up virtual environment, we recommend doing the followings:
python3 -m venv where_ever/you/want/cs188
source where_ever/you/want/cs188/bin/activate
# And use the following to deactivate within a virtualenv.
deactivate
The rest of installing the packages should remain the same as in using conda.
For using cloud services, please refer to the slides in the respective TA section. This readme will only contain more strictly project-related instructions.
When your OS features any GPUs, please only access to one of them as our task is not that heavy, and hence does not need to waste computational resources using multi-GPUs in general (but feel free to do so if you'd like).
# Check your GPU status, this will print all the essential information about all the GPUs.
nvidia-smi
# Indicating the GPU ID 0.
export CUDA_VISIBLE_DEVICES="0"
# OR, GPU ID 2.
export CUDA_VISIBLE_DEVICES="2"
# For multi-gpu settings, suppose we would like to use 0, 1, 2 (3 GPUs).
export CUDA_VISIBLE_DEVICES="0,1,2"
# For cpu-only, and hence `no_cuda`.
export CUDA_VISIBLE_DEVICES=""
For basic execution demo, after finishing up all the required TODO
s in trainers/*.py
,
execute the following command:
sh scripts/train_dummy.sh
And immediately you should see training and evaluation loops being successfully executed.
All the required TODO
blocks are in the codes under data_processing
folder.
Please refer to the README under data_processing
folder for data schema and execution examples.
Please refer to the README under trainers
folder for details of TODO
s.
We have prepared a few scripts for you to use:
# Script for finetuning a model on Sem-Eval dataset.
sh scripts/train_semeval.sh
# Script for finetuning a model on Com2Sense dataset.
sh scripts/train_com2sense.sh
# Script for running an MLM pretraining on some dataset.
sh scripts/run_pretraining.sh
# Script for loading an MLM-pretrained model and continue finetuning.
sh scripts/finetune_from_pretrain_dummy.sh
In any training script, if you comment out (or remove) the --do_train
argument, then the script will only execute the testing.
In this case, please ensure the --iters_to_eval
is properly set to the checkpoint(s) you would like to evaluate.
Other most essential arguments to be aware of and you SHOULD TRY TO TUNE are:
TASK_NAME
is to be consistent withdata_processing/__init__.py
.DATA_DIR
to be made sure correspond to a proper dataset folder.MODEL_TYPE
should be one of the model card, e.g.bert-base-cased
model_name_or_path
can be same asMODEL_TYPE
or a saved directory which contains the checkpoint, e.g.outputs/pretrain/ckpts/checkpoint-100
for restoring or finetuning a trained checkpoint.- Remember that you CAN ONLY use one of the following models for training on both Sem-Eval and Com2Sense:
{bert, deberta, roberta}
.
- Remember that you CAN ONLY use one of the following models for training on both Sem-Eval and Com2Sense:
tokenizer_name
andconfig_name
can be both not specified (hence refer tomodel_name_or_path
) or same asMODEL_TYPE
.per_gpu_train_batch_size
andper_gpu_eval_batch_size
can be used to tune training/evaluation batch size.learning_rate
tunes the initial learning rate.num_train_epochs
maximum number of epochs to train the model.max_seq_length
the maximum sequence length of inputs to the model.output_dir
where to save your outputs and checkpoints.save_steps
denotes per how many steps we save the models.logging_steps
denotes per how many steps we evaluate the models during training.max_eval_steps
maximum evaluation steps for an evluation split of a dataset.eval_split
the split to be evaluated on, during training it should bedev
and it should betest
during testing.iters_to_eval
the iterations of the saved checkpoints to be evaluated on.- If you implement saving the best functionality, it can also be
best
instead of a number.
- If you implement saving the best functionality, it can also be
It is often important to visualize your training curves and other essential information during training for troubleshooting problems or ensuring your training is stable (e.g. observing if your training is over/under-fitting).
People often use TensorBoard (the PyTorch compatible version) for this purpose.
Please check the introduction in the link to get familiar with this very useful tool.
Each execution of your codes will automatically create a subfolder under the folder runs
, e.g. runs/Jan01_06-03-20_pluslab-a100_ckpts_dummy
can be a subfolder containing tensorboard events of one of your execution.
Executing the following command a tensorboard-in-browser will be rendered:
tensorboard --logdir=the/tensorboard/event/dir
# For example:
tensorboard --logdir=runs/Jan01_06-03-20_pluslab-a100_ckpts_dummy
You should see the followings as an output:
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.7.0 at http://localhost:6006/ (Press CTRL+C to quit)
Since the localhost
is on the server, unless you can directly access to the GUI on the server, you may find this SSH Tunneling handy.
For example, if you execute the following command on your local terminal:
ssh -N -f -L localhost:6006:localhost:6006 your_id@the_server_name
Then, open up a browser on your machine and in the search bar enter http://localhost:6006/
you will see the rendered tensorboard shown up!