This repository contains the code required for the pre-training of seBERT as well as an example for fine-tuning and using the model.
bash scripts/prepare_data.sh
- Build the NVIDIA Docker container enviroment.
bash scripts/docker/build.sh
- Launch the container.
bash scripts/docker/launch.sh
- From within the container run:
bash scripts/run_pretraining_seBERT.sh
The complete fine-tuning and evaluation code can be found in the fine_tuning folder.
In this section, we give a small example for fine-tuning the seBERT model and using the fine-tuned version.
We give an example for the fine-tuning process so that you can use the pre-trained seBERT model for your own tasks.
This can be found in the Jupyter Notebook in notebooks/FineTuning.ipynb
.
In addition, we give an example for the use of the fine-tuned model in a sequence classification task in notebooks/Classification.ipynb
.
Create a virtualenv within the notebooks dir, install dependencies and download sentiment data.
cd notebooks
python3.8 -m venv .
source bin/activate
pip install --upgrade pip
pip install torch pandas numpy scikit-learn transformers jupyterlab ipywidgets
cd data
wget -O github_gold.csv https://figshare.com/ndownloader/files/21001260
cd ../../
Load the pre-trained seBERT model.
cd notebooks/models
wget https://smartshark2.informatik.uni-goettingen.de/sebert/seBERT_pre_trained.tar.gz
tar -xzf seBERT_pre_trained.tar.gz
cd ../../
Start Jupyter lab
cd notebooks
jupyter lab