Text classification code using SoPa, based on "SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines" by Roy Schwartz, Sam Thomson and Noah A. Smith, ACL 2018
The code is implemented in python3.6 using pytorch. To run, we recommend using conda:
conda env create -f environment.yml
source activate sopa
The training and test code requires a two files for training, development and test: a data file and a labels file. Both files contain one line per sample. The data file contains the text, and the labels file contain the label. In addition, a word vector file is required (plain text, standard format of one line per vector, starting with the word, followed by the vector).
For other paramteres, run the following commands using the --help
flag.
To train our model, run
python3.6 ./soft_patterns.py \
-e <word embeddings file> \
--td <train data> \
--tl <train labels> \
--vd <dev data> \
--vl <dev labels> \
-p <pattern specification> \
--model_save_dir <output model directory>
To test our model, run
python3.6 ./soft_patterns_test.py \
-e <word embeddings file> \
--vd <test data> \
--vl <test labels> \
-p <pattern specification> \
--input_model <input model>
Under construction.
python -m unittest
If you make use if this code, please cite the following paper:
@inproceedings{Schwartz:2018,
author={Schwartz, Roy and Thomson, Sam and Smith, Noah A.},
title={{SoPa}: Bridging {CNNs}, {RNNs}, and Weighted Finite-State Machines},
booktitle={Proc. of ACL},
year={2018}
}
For questions, comments or feedback, please email roysch@cs.washington.edu