- To run the project clone the repository
- Install the requirements from requirements.txt
- Make sure spark-3.0.1-bin-hadoop2.7 installed on system Find Installation Instructions here.
- Run the project using Spark-exercises notebook
- Experiment with project creating your own queries
RammySekham/spark-analytics
spark analytics using pyspark, spark dataframes and spark sql, parsing user logs, handling unstructured data
Jupyter NotebookMIT