/Extracting-Relevant-Document

Projects contains based on Big Data

Primary LanguageJava

Big-Data

Projects based on Big Data.

This project is based on Document analysis where large number of corpus is given and the relevant document should be extracted with the help of Term Frequency- Inverse Document Frequency(TF-IDF) which is known as feature extration.

This project uses Hadoop Map Reduce, Spark RDD and Spark SQL.

All these programs are available in these file with explanation.