Bare bones application which reads a Wikipedia xml dump, parses it into TF-IDFs and uses them in a Naive Bayes framework to do topic assignments.
sbt run
sbt test
This is a mash-up of two blog posts with some adjustments
based on Chimpler's blog-spark-naive-bayes-reuters published in:
and borrows some xml parsing methodlogy from: