/big-data-traning

A program that analyzes Reddit comments on politics by parsing texts into a smooth format and use them to train a Spark's classifier to study the data points and trends on Reddit.

Primary LanguagePython

			CS143 Project 2B

(Extra credit) Task 10 part 5: We compute the percentage of positive and negative comments over each month 

To run the script:
Usage: `spark-submit reddit_model.py` or run in `pyspark` shell
Additional: The program requires the use of javascript 8 not 11
 - To change the version of javascript in Linux, type `sudo update-alternatives --config java`

To run analysis.R you may need to move and rename the .csv files. If it will not take the .csv files after you have moved them, then it may be necessary to go into the R files and change the file references to the full path.