The overall goal of these exercises is to introduce some of the concepts we learnt during the lectures, but also to use some of the tools we use on a daily basis as data scientists.
Remember to select the LCG 102 option from the software stack drop down when you 'spawn' the machine on SWAN.
git clone https://github.com/eamonnmag/CERN-CSC-2022.git
In these exercises we look at:
- Visual Exploration of a Dataset - using visualization to explore data and tell a story of interesting insights found in our data. This will be performed using:
- How to create visualizations using these tools for visualization of distributions, correlations, identifying outliers, etc.
- How to customize visualizations to create more coherent visualizations by removing noise from plots such as distracting lines, axes boundaries, and so on.
- For Altair, how to build a complex dashboard-like visualization in Jupyter.
The core exercises are all in the static visualization section, since this is what most people use when producing figures for example, they are also generally more scalable which is of particular importance when dealing with huge datasets.
The interactive visualization section is more for those who are already well versed in Matplotlib and Seaborn, and who want to extend their knowledge.
Thanks to the creator of the FIFA Kaggle Data set, and the SWAN team @ CERN for helping me in preparing this tutorial!