/topbox

Python 3 wrapper around the Stanford Topic Modeling Toolbox. Intended to be used for hassle-free supervised topic classification with Labeled Latent Dirichlet Allocation (L-LDA, LLDA, sLDA).

Primary LanguagePythonGNU General Public License v2.0GPL-2.0

topbox

A small Python 3 wrapper around the Stanford Topic Modeling Toolbox (STMT) that makes working with L-LDA a bit easier; no need to leave the Python environment. More information on its workings can be found here.

Setting up

Docker Setup

On Linux, this would look something like this:

git clone https://github.com/jonaschn/topbox
cd ~/topbox/box
wget http://nlp.stanford.edu/software/tmt/tmt-0.4/tmt-0.4.0.jar
cd ..
docker build -t jonaschn/topbox:latest .
docker run -v `pwd`:/opt/topbox -it jonaschn/topbox:latest /bin/bash

You can run the script with python test.py to test if it's working.

Manual Setup

You need to have an old Java SDK, version 6 or 7. Otherwise it will not work.