/IPL-Data-Analysis

This repo contains the codes and report for EDA and Data Analysis done during the project

Primary LanguageJupyter NotebookMIT LicenseMIT

IPL-Data-Analysis

This repo contains the codes and report for EDA and Data Analysis done during the project.

Phase 1:

Did some Exploratory Data Analysis tasks, following tasks were accomplished:

  1. Finding maximum number of wins by any team in particular seasons.
  2. Which stadium hosted the most number of IPL matches?
  3. Which team has won the most number/percentage of matches?
  4. Which player has won the most number of Man of the Match (MoM) awards?
  5. Which team has won the most tosses?
  6. What are the top 10 greatest victories (by runs and by wickets)?
  7. Most 50s and 100s scored.
  8. plotly scatter plot for comparison between any given batsmen
  9. Made a scatter plot with y-axis as mean strike rate per over, x-axis as number of over, color according to batsmen(take top 20-30 batsmen according to score) and size as number of bowls faced by them (to get understanding of their experiance)
  10. Made a bi-histogram plot for some Team1 vs Team2 with x axis as different years. This will give us an estimate of how the two teams perform against each other over the years. Make a function so that we can easily put two different teams.
  11. Used Tableau Dashboard feature to get some plots for top batsmen and bowlers.

Phase 2:

Utilized different Machine Learning models to Perform Classification Task (Predicting the Winner of the a match) and Regression Task (Final score prediction). And finally compaired the accuracies and got the optimal model for the task.

  • Models used for Classification task- Logistic Regressoin, Support vector machine, Decision Tree, and Random Forest
  • Models used for Regression task- Linear Regression, Random Forest and Linear SVR

Phase 3:

Designed Neural Networks to accomplish the same tasks as mentioned above, Observed the accuracy and R2 score v/s epoch trend for training and validation sets and tuned the hyperparameters based on that.


Detailed Discussion is done in the attached Report.