/EDA_IPL

Primary LanguageJupyter Notebook

EDA_IPL

Exploaratory Data Analysis of IPL Dataset

  • EDA is one of the most important aspects of any Data Science project. It easily accounts for 60-70% of work before one goes on to modelling.
  • EDA is basically 'A first glance at the data'. In order to generate a good model from the dataset we first need to understand the data. EDA helps in achieving that. It summarizes data, finds patterns between data points and is often presented via visual methods (diagrams, graphs) for easy understandability.

Real world data is most of the times unstructured and dirty in nature. If we do not perform EDA first and directly feed this data into any machine learning algorithm, then we can be pretty sure that we would not get any desireable result. EDA helps in identifying gaps in the data and then filling those gaps appropriately.

The Objective of any EDA process is to analyse the underlying structure of a dataset which can help in insight generation.

Repository Overview

This repository is about EDA performed on Indian Premier League Data available between 2008 - 2018.

  • It has a python notebook where all the coding has been done
  • It also has the relevant datasets used for this analysis

Brief Overview of the steps followed in this project

There are 2 datasets used for the analysis. 'matches' dataset gives information on all the matches played in IPL between 2008 and 2018 i.e. information on the following:
1.Season
2.Venue
3.City
4.Teams
5.Winner
6.Toss Winner
7.Winning margin
8.Umpires

On the other hand 'deliveries' dataset gives ball by ball information on every match played during 2008 - 2018. In a nutshell it provides information on the following

1.Runs scored per ball
2.Extras by bowlers/fielding team
3.Wickets taken
4.Types of dismissals
5.Venue
It basically provides individual records of players.

Important observations that I made during Exploratory Data Analysis

  • Mumbai Indians is the most successful team in IPL.
  • Mumbai Indians has won the most number of toss.
  • There were more matches won by chasing the total(419 matches) than defending(350 matches).
  • When defending a total, the biggest victory was by 146 runs(Mumbai Indians defeated Delhi Daredevils by 146 runs on 06 May 2017 at Feroz Shah Kotla stadium, Delhi).
  • When chasing a target, the biggest victory was by 10 wickets(without losing any wickets) and there were 11 such instances.
  • The Mumbai city has hosted the most number of IPL matches.
  • Chris Gayle has won the maximum number of player of the match title.
  • Winning toss gives a slight edge(52% probability of winning) against the opponents.
  • Five Indian players have figured in the top ten IPL players list.
  • S. Ravi(Sundaram Ravi) has officiated the most number of IPL matches on-field.
  • Eden Gardens has hosted the maximum number of IPL matches.
  • Chris Gayle is the most destructive batsman in Super Over.
  • JP Faulkner is most difficult bowler to face in Super Over.
  • KKR is the best chasing team.
  • Till 2019, 40 venues have hosted 756 IPL matches. and many more...

Conclusion

I would like to thank The Spark Foundation for giving me this opportunity to complete the task under internship program.