README

Quick summary: This code implements a spectral (third order tensor decomposition) learning method for learning LDA topic model on Spark.
Version: 0.0
Learn Markdown

install sbt
open your terminal
cd to SpectralLDA-Tensor
type: "sbt" (or "sbt -mem ", e.g., "sbt -mem 40960")" )
In sbt, type: "run src/main/resources/Data/datasets/synthetic/samples_train_libsvm.txt --synthetic 1"

Summary of set up Main file is SpectralLDA-Tensor/src/main/scala-2.10/SpectralLDA.scala
Configuration
(1). Synthetic Experiments:

cd SpectralLDA-Tensor/

sbt

run src/main/resources/Data/datasets/synthetic/samples_train_libsvm.txt --synthetic 1

(1.1). Data generation script in MATLAB is provided in the repository here. One can play around with hyperparameters such as Sample Size, Vocabulary Size, Hidden Dimension, and How mixed the topics are. The synthetic data for training are then generated as datasets/synthetic/samples_train_libsvm.txt and datasets/synthetic/samples_test_libsvm.txt in the libsvm format and as datasets/synthetic/samples_train_DOK.txt and datasets/synthetic/samples_test_DOK.txt in the DOK format.
(1.2). Our program reads libsvm format.
(2). Real Experiments:

cd SpectralLDA-Tensor/

sbt -mem 1024

run <PATH_OF_YOUR_TEXT>

for example:

run src/main/resources/Data/datasets/enron_email/corpus.txt

(2.1). Our program takes raw text (NOTE: Each text file line should hold 1 document).
Dependencies
- You should install sbt before running this program.
- See [build.sbt]for the dependencies.

animakumar/SpectralLDA-TensorSpark