lksv/praguehacks2016-categorizer

Simple classification engine for government/municipality documents built with TensorFlow

Python

Categorizer (a PragueHacks 2016 project)

Simple classification engine for government/municipality documents built with TensorFlow Documents are tagged based on occurrence of certain words and other characteristics of a document.

This project is a prototype for

Built during Prague Hacks 2016

Setup

Requirements:

bash
python 2.7
numpy
tensorflow 0.10.0 (does not work with 0.11.0rc0 due to tensorflow/tensorflow#4715)

Run

Prepare data:

copy tagged content files to ./input
copy feature vector to features.csv
export CATS=`cat cats.txt
bash generate-all.sh features.csv $CATS

Train DNN

python train.py $CATS

Run classification on new data

python predict.py features.csv $CATS output.csv