muti-word bias neutralization.
Under Project folder, do the following:
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python
>> import nltk; nltk.download("punkt")
You need download pretrained bert model.
- Download the Bert pretrained model from s3 1.1 wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin 1.2 mv bert-base-uncased-pytorch_model.bin pytorch_model.bin
- Download the Bert config file from s3 2.1 wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json 2.2 mv bert-base-uncased-config.json config.json
- Download the Bert vocab file from s3 3.1 wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt 3.2 mv bert-base-uncased-vocab.txt bert_vocab.txt
- Rename:
bert-base-uncased-pytorch_model.bin
topytorch_model.bin
bert-base-uncased-config.json
toconfig.json
bert-base-uncased-vocab.txt
tobert_vocab.txt
- Place
model
,config
andvocab
file into the./src/strongClassifier/pybert/pretrain/bert/base-uncased
directory. - Modify your data format according to kaggle data and place in
pybert/dataset
.- you can modify the
io.task_data.py
to adapt your data.
- you can modify the
- Run
python run_bert.py --do_data
to preprocess data. - Run
python run_bert.py --do_train --save_best --do_lower_case
to fine tuning bert model. - Run
run_bert.py --do_test --do_lower_case
to predict new data.
You need download pretrained GLOVE embedding.
Download (http://nlp.stanford.edu/data/wordvecs/glove.6B.zip), unzip, and put glove.6B.100d.txt to ./src/seq2seq/
You need to install R software environment and its packages "mclust" and "rjson".