CS-7650-Project

muti-word bias neutralization.

Installation

Under Project folder, do the following:
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ python
>> import nltk; nltk.download("punkt")

You need download pretrained bert model.

Download the Bert pretrained model from s3 1.1 wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin 1.2 mv bert-base-uncased-pytorch_model.bin pytorch_model.bin
Download the Bert config file from s3 2.1 wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json 2.2 mv bert-base-uncased-config.json config.json
Download the Bert vocab file from s3 3.1 wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt 3.2 mv bert-base-uncased-vocab.txt bert_vocab.txt
Rename:
- bert-base-uncased-pytorch_model.bin to pytorch_model.bin
- bert-base-uncased-config.json to config.json
- bert-base-uncased-vocab.txt to bert_vocab.txt
Place model ,config and vocab file into the ./src/strongClassifier/pybert/pretrain/bert/base-uncased directory.
Modify your data format according to kaggle data and place in pybert/dataset.
- you can modify the io.task_data.py to adapt your data.
Run python run_bert.py --do_data to preprocess data.
Run python run_bert.py --do_train --save_best --do_lower_case to fine tuning bert model.
Run run_bert.py --do_test --do_lower_case to predict new data.

You need download pretrained GLOVE embedding.

Download (http://nlp.stanford.edu/data/wordvecs/glove.6B.zip), unzip, and put glove.6B.100d.txt to ./src/seq2seq/

You need to install R software environment and its packages "mclust" and "rjson".

binghe2727/CS-7650-Project

CS-7650-Project

Installation