/BigDataWikiInvertedIndex

Assignment for Big Data Analysis Class

Primary LanguageJava

Big Data - Wikipedia Inverted Index

Assignment 2 for Brandeis University COSI 129a Big Data Analysis Class.

This program is run on YARN/Hadoop 2.3.0 using the Cloud9 Big Data Wikipedia Toolkit to transform a raw xml file obtained from Wikipedia.com. It is cross-matched with a list of people names and the content from the page are lemmatized using the Stanford Core NLP library the words counted and indexed from each page.