/python-viz

Data visualization with python. Personal learning repository.

Primary LanguageJupyter Notebook

python-viz

A very subjective collection of Jupyter notebooks to explore data visualization techniques using Python. Documents the outcome of my Space Time project around learning (a bit of) Python, Pandas, Seaborn, Altair and last but not least Eland.

The aim of this endavour is to be able to work closer with the data team and better understand their work. Understanding the python/notebook ecosystem gives us an opportunity in the UI team to directly pick up the work the data team does and prototype visualizations and UI based off it. Esp. using seaborn we can come up very quickly with visualizations based off the data frame analytics data we have in Elasticsearch. Doing this quick prototyping could act as a first stage to evaluate possible UIs we want to have in the Kibana ML plugin.

Another thing I wanted to evaluate is if it's possible to come up with a workflow where a data scientist would create and design visualizations in Python notebooks and then deploy them in Kibana. I didn't get close to a stage where this would be automated but at least I made some promising findings. The Altair visualization library can be used to create VEGA based visualization in notebooks. Altair can be used to export VEGA specs and those can then be used in Kibana using its VEGA plugin. Fetching the data needs manual adaption, but this way it's possible to deploy visualizations looking exactly the same from Python notebooks to Kibana dashboards.

A note on the notebooks: Not all of them render completely on Github, for example the Altair examples require being run in a notebook to render correctly.

altair_geo

altair_geo/altair_geo.ipynb

Population Distribution

datasaurus

datasaurus/datasaurus.ipynb

Datasaurus Scatterplot

MIYO IoT Sensor Data

MIYO IoT Sensor Scatterplot Matrix

MIYO IoT Sensor SHAP Feature Influence

MIYO IoT Sensor Time Series

  • Data based on the MIYO IoT Weather & Irrigation sensor. The notebooks demonstrate data manipulation of sparse source data to make it suitable for e.g. regression analysis.
  • The miyo.ipynb notebook uses Seaborn for scatterplot matrices and SHAP to visualize feature influence.
  • The miyo_altair.ipynb notebook uses Altair to replicate the scatterplots and additionally demonstrates how to visualize time series sensor data as small multiples.

seaborn_annotations

seaborn_annotations/seaborn_annotations.ipynb

Before (seaborn default)

Seaborn Original Boxplt

After (revamped styling & annotations)

Seaborn Annotations

  • Demonstrates how to redesign a standard seaborn boxplot into a more compelling version that draws the attention towards certain data points using highlighting and annotations. Based on that shows how to wrap a custom visualization in a helper function and create a reusable module out of it.
  • Data Source: Spotify Charts Top 50 2018, License "Data files © Original Authors".

shap_cars

shap_cars/shap_cars.ipynb

Cars Dataset Scattplot

  • shap_cars.ipynb demonstrates how to tweak a scatterplot matrix to only show the value distributions relevant for regression analysis, with one row for the target attributes and a chart for each attribute used for the analysis. Note the dataset itself could be transformed to get more interesting results, this was just used to document a very basic workflow to visualize source data with Seaborn, run regression analyis on it and visualize the results using the SHAP library.
  • Dataset: Vehicle dataset from Cardekho, License unknown.

eland

eland/eland.ipynb

Cars Dataset Scattplot

eland.ipynb demonstrates basic usage of eland to fetch data from Elasticsearch into a Pandas data frame and render it with Altair.

anomaly_detection

anomaly_detection/anomaly_detection.ipynb

Cars Dataset Scattplot

anomaly_detection.ipynb replicates the Elasticsearch Machine Learning plugin's swimlanes using Altair in Python.