Effective Data Visualization
PyCon 2020
This is a tutorial as part of PyCon 2020 https://us.pycon.com The presentation page is https://us.pycon.org/2020/schedule/presentation/74/ This tutorial has become a fully online tutorial due to PyCon moving to an online format due to CoViD-19. The video is published at https://www.youtube.com/watch?v=cNXioWhEeJc
Author
This tutorial is written by Husni Almoubayyed.
Please contact me for any suggestions, comments, or discussions.
Please open a GitHub issue if you find anything that need corrections, additions, or things that need to be updated as time goes by.
You can find up to date ways to contact me at https://husni.space
About
From picking the right plot for the particular type of data, statistic, or result; to pre-processing sophisticated datasets, and even making important decisions about the aesthetic of a figure, visualization is both a science and art that requires both knowledge and practice to master.
This tutorial is for python users who are familiar with python and basic plotting, and want to build strong visualization skills that will let them effectively communicate any data, statistic, or result.
We will use python libraries such as seaborn, matplotlib, plotly, and sklearn; and discuss topics such as density estimation, dimensionality reduction, exploring similar datasets, interactive plotting, and making suitable choices for communication. Drawing examples from datasets in the scientific, financial, geospatial (mapping) fields and more.
Installation
In your terminal, run
git clone https://github.com/hsnee/PyCon2020_DataVisualizationTutorial.git
to download the content of this tutorial.
After that, you can run
jupyter notebook Effective_Data_Visualization_Workbook.ipynb
This assumes you have a Python version >3.7 and Jupyter installed. If not, an easy way to get this with most of the libraries needed for the tutorial is to install the latest https://anaconda.com distribution for your system, or to simply just visit https://bit.ly/PyConViz20 for a Google Colab-hosted version of this tutorial (or https://bit.ly/PyConViz20Full for the full version with all the solutions).
Acknowledgement
A lot of people have helped shape my understanding of data visualization throughout the years, either throughout conversations with them, being directly taught by them, or being influenced by their work. I am indebted to the LSSTC Data Science Fellowship Program (in particular, Adam Miller and Lucianne Walkowicz), Jake Vanderplas, Skipper Seabold, and to many helpful comments and discussions throughout the years by my supervisor Rachel Mandelbaum.