/sb-miniproject6

Post-Sale Automobile Report - Using Spark

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

sb-miniproject6

Post-Sale Automobile Report - Using Spark This project does the same job with sb-miniporject5 but with Spark job instead of Hadoop's map-reduce.

The purpose of this project is illustrate the power of Spark compared to Hadoop Mapreduce. The final solution look simpler and faster.

Requirement

  • Hadoop and Spark are installed and config properly.
  • Module pyspark is installed

Setup and Run project

Clone the project on your local working directory

$ git clone https://github.com/trdtnguyen/sb-miniproject6.git
$ cd sb-miniproject6

To run the project, just simply execute the run.sh

$ ./run.sh