SMS Classifier for LG
Setting up environment
Set up python3 virtual env using:
virtualenv -p /usr/bin/python3 <envname>
example : virtualenv -p /usr/bin/python3 my_env
Use it by:
source <envname>/bin/activate
Install requirements using:
pip install -r requirements.txt
NLTK libraries:
python -m nltk.downloader 'averaged_perceptron_tagger'
python -m nltk.downloader 'wordnet'
python -m nltk.downloader 'punkt'
Usage
First you'll have to generate dependency-relation files for your dataset. To generate dependency files:
-
Download Stanford dependency parser from here https://nlp.stanford.edu/software/stanford-parser-full-2017-06-09.zip , if you don't have it already. Unzip it.
-
Set $CLASSPATH environment variable to the path to the Stanford dependency parser. export CLASSPATH=path/to/parser/stanford-parser-full-2017-06-09
-
Keep your training and testing files (tab seperated files, tsv) in a folder, say "dataset", with names train.tsv and test.tsv respectively.
dataset/ | train.tsv | test.tsv
-
Install these packages in python3 if not already installed: nltk pandas argparse pickle copy
-
Run gen_dep_relation.py with python3. python3 gen_dep_relation.py --data path/to/dataset/ Don't forget the ending "/"
-
Dependency files will be stored in the dataset folder itself. Note: It may take long time depending upon the size of dataset.
Use
source <envname>/bin/activate
python sms_class.py -h
for help on all the options For a basic minimal run, do:
python sms_class.py --data <path_to_data_files/>
Remeber that the data files need to be train.tsv, tune.tsv, test.tsv. Also don't forget the ending "/" in the path.
Following feature files must be present in the sms_class.py directory for full support:
- fire.words - Fire related words, one word per line
- health.words - health related emergency words, one per line
- personal.words - personal related words, one per line
- wsd_features - allowable wsd synsets, <stem_word>
Saving features
Use
python sms_class.py --data path_to_data_folder/ --to_weka
to save the train, tune, test files in weka format
Saving and predicting from model
Save model as:
python sms_class.py --data path_to_data_foler/ --save
The above will generate a '.model' file Predict from the above model file using:
python sms_class.py --data path_to_data_folder/ --model <model_file> --predict