The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below: WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1. pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feature p ackage of Mlib library of Spark. Find the execution time using 5,10 and 15 reducers. Word2Vec-Find the feature vectors of words using the word2vec class of Mlib library

Python

tf-idf-

The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below:

WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1.
pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feature p ackage of Mlib library of Spark. Find the execution time using 5,10 and 15 reducers.
Word2Vec-Find the feature vectors of words using the word2vec class of Mlib library

SreekarJammula/tf-idf-

tf-idf-