Running the Models on Text Data
jcui1224 opened this issue · 2 comments
Hello,
I cannot find the code in the repo to reproduce the NLP data results. I follow your instructions to preprocess the data and get the BERT embeddings. But it seems that the training command is missing and the code train_semisup_flowgmm_tabular.py got some unexpected error.
I tried to adapt the code and command for tabular data to run Ag-News. It runs but the accuracy is quite low (~0.3). I guess I must have missed something important.
Could you please share the code or provide some guidance to reproduce the NLP data results?
I really appreciate your help and time!
Hi Jiali,
Thanks for taking an interest in FlowGMM! Sorry about disorganized state of the code for the NLP datasets. We'll host the preprocessed version of the datasets online so as to avoid pitfalls and inconveniences in going through these steps manually in the next couple of days.
The command for for training FlowGMM on YAHOO answers is python flowgmm_tabular_new.py --trainer_config "{'unlab_weight':.2}" --net_config "{'k':1024,'coupling_layers':7,'nperlayer':1}" --network RealNVPTabularWPrior --trainer SemiFlow --num_epochs 200 --dataset YAHOO --lr 3e-4 --train 800
Likewise for AG-NEWS the command is python flowgmm_tabular_new.py --trainer_config "{'unlab_weight':.6}" --net_config "{'k':1024,'coupling_layers':7,'nperlayer':1}" --network RealNVPTabularWPrior --trainer SemiFlow --num_epochs 100 --dataset AG_News --lr 3e-4 --train 200
Stay tuned for an update in the next couple of days for automatic dataset download like for the other 2 tabular datasets.
Cheers!
We've updated the repo and you now should be good to go running the models following the instructions in the readme. After pip installing, when you run the above commands the datasets will be downloaded and you should get similar numbers to the table.