Simple classification engine for government/municipality documents built with TensorFlow Documents are tagged based on occurrence of certain words and other characteristics of a document.
This project is a prototype for
Built during Prague Hacks 2016
Requirements:
- bash
- python 2.7
- numpy
- tensorflow 0.10.0 (does not work with 0.11.0rc0 due to tensorflow/tensorflow#4715)
Prepare data:
- copy tagged content files to ./input
- copy feature vector to features.csv
export CATS=`cat cats.txt
bash generate-all.sh features.csv $CATS
Train DNN
python train.py $CATS
Run classification on new data
python predict.py features.csv $CATS output.csv