This repository contains code for distributed data processing and analysis using Apache Spark and Dask. The analysis covers eCommerce online store reviews, including visualization of conversion rates, engagement metrics, and price range impacts.
- Data Source: eCommerce store reviews dataset
- Processing Systems:
- Apache Spark (using
spark-shell
and Scala) - Dask (using Python scripts in the terminal)
- Apache Spark (using
- Visualization Tools:
- Spark: JFreeChart (a Java library, but you can use JFreeChart in Scala within the Spark Shell.)
- Dask: Matplotlib and Seaborn (Python libraries for visualizations)
- Platform: Ubuntu terminal
- Open the
spark-shell --packages org.jfree:jfreechart:1.5.3
this will include the JFreeChart library as a dependency when starting Spark Shell - Load the Scala scripts in the repository.
- Execute the analysis scripts.
- The JFreeChart will visualize the results.
- Open a terminal.
- Run the Python scripts in the
dask/
directory. (you usenano
before the file name for opening the file, andpython
before the file name to run the file) - View the results using Matplotlib and Seaborn visualizations.