/Bundesliga-Big-Data-Analysis-using-PySpark

Performed Big Data Analysis on Bundesliga Football League Dataset using tools PySpark, spark-SQL, and numpy and done in Jupyter Notebook.

Primary LanguageJupyter Notebook

Big Data Analysis on Budesliga using PySpark

CONTEXT

Bundesliga, is a professional association football league in Germany. At the top of the German football league system, the Bundesliga is Germany's primary football competition.

PySpark is the Python API for Apache Spark, an open source, distributed computing framework and set of libraries for real-time, large-scale data processing.

OBJECTIVE

The objective of this project is to perform Big Data Analysis on Bundesliga (German Football Association) dataset using PySpark.

Questions-

Q1- Who are the winners of the D1 division in the Germany Football Association (Bundesliga) between 2000-2010?

Q2- Which teams have been relegated in the past 10 years?

Q3- Does octoberfest affect the performance of Bundesliga?

Q4- Which season of bundesliga was the most competitive in the last decade?

Q5- What's the best month to watch Bundesliga?