Great stories and great visual effects
Author: Ruturaj Kiran Vaidya
├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
Project based on the cookiecutter data science project template. #cookiecutterdatascience
If you are only interested in looking at the notebook then go to (There are notebook rendering problems in github ecosystem):
All the graphs are plotted using matplotlib
and plotly
.
- Load the dataset and see how storytelling and visualization help describing the dataset and finding additional feature
- Build 3 or more visualizations
I selected the soccer dataset used in previous projects, as it has a lot of features to work with and it will be great for visualizations.
Dataset: https://github.com/fivethirtyeight/data/tree/master/soccer-spi
Specific Dataset Link: https://projects.fivethirtyeight.com/soccer-api/club/spi_matches.csv
Unique Idea (apart from project aim): In addition to what has been instructed, I decided to use two different visualization libraries - "matplotlib and plotly" - to test their graphs, code in general. Also it's so much fun!
First, I decided to go with the soccer dataset, because I thaught that it would be very great for visualizations and as I am a big football (soccer) fan, I thaught that it would a fun personal project. After selecting the dataset, I decided to work on specific features and ignore others. I also decided to visualize using a couple of libraries - so that I can compare the visualization and code required to plot. I used matplotlib and plotly to visualize. Following are some of the visualizations:
Plotted using matplotlib
:
Plotted using plotly
:
While, plotting these - I thought of looking at home and away win percentage, i.e. if it really matters whether you play at home or at away ground. From following plot, it seems that it does matter (at certain percentage), although it is not totally impossible to win if playing away.
Plotted using plotly
:
By comparing two different visualization libraries, I observed that plotly does give better results with less (or rather simple) code.
I got some of the ideas from following post:
- Kaggle data anlysis: https://www.kaggle.com/pavanraj159/european-football-data-analysis/notebook
MIT