/praguehacks2016-categorizer

Simple classification engine for government/municipality documents built with TensorFlow

Primary LanguagePython

Categorizer (a PragueHacks 2016 project)

Simple classification engine for government/municipality documents built with TensorFlow Documents are tagged based on occurrence of certain words and other characteristics of a document.

This project is a prototype for

Built during Prague Hacks 2016

Setup

Requirements:

Run

Prepare data:

  1. copy tagged content files to ./input
  2. copy feature vector to features.csv
  3. export CATS=`cat cats.txt
  4. bash generate-all.sh features.csv $CATS

Train DNN

  1. python train.py $CATS

Run classification on new data

  1. python predict.py features.csv $CATS output.csv