Download the dataset from Google Drive. Untar the file by command:
tar zxvf release_data.tar.gz
mv release_data data
pip install -r requirements.txt
# install spacy English Module
python -m spacy download en_core_web_sm
# store the tensorboard result and model checkpoints
mkdir "tb_logs"
python adv_nn_clf.py --tgt_domain gossip \
--src_domain politi,health_deterrent \
--epochs 50 \
--weak_labels_path ./data/weak_label_all.csv \
--pre_train_epochs 20 \
--special_tag "weight_analysis" \
--model_type new \
--is_omit_logits \
--weak_fn adverb \
--weight_decay 0 \
--main_lr_rate 0.001 \
--group_lr_rate 0.01 \
--lr_rate 0.001 \
--hyper_beta 0.3 \
--lambda 0.1 \
--weak_label_count 50
You can change the target domain and source domain based on your need. There is no order requirement for source domains.
There are three different weak labeling functions: you, adverb and swear.