/tf-idf-

The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below: WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1. pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feature p ackage of Mlib library of Spark. Find the execution time using 5,10 and 15 reducers. Word2Vec-Find the feature vectors of words using the word2vec class of Mlib library

Primary LanguagePython

tf-idf-

The current assignment is to write the python scripts for Apache Spark. The tasks are divided into three parts as below:

  1. WordCount-To count the occurrences of words in a book on a per-book basis and compare the results with those of Assignment1.
  2. pyspark.ml. feature- To count the tf-idf values for the unigram and bigrams using the pyspark.ml.feature p ackage of Mlib library of Spark. Find the execution time using 5,10 and 15 reducers.
  3. Word2Vec-Find the feature vectors of words using the word2vec class of Mlib library