A very subjective collection of Jupyter notebooks to explore data visualization techniques using Python. Documents the outcome of my Space Time project around learning (a bit of) Python, Pandas, Seaborn, Altair and last but not least Eland.
The aim of this endavour is to be able to work closer with the data team and better understand their work. Understanding the python/notebook ecosystem gives us an opportunity in the UI team to directly pick up the work the data team does and prototype visualizations and UI based off it. Esp. using seaborn we can come up very quickly with visualizations based off the data frame analytics data we have in Elasticsearch. Doing this quick prototyping could act as a first stage to evaluate possible UIs we want to have in the Kibana ML plugin.
Another thing I wanted to evaluate is if it's possible to come up with a workflow where a data scientist would create and design visualizations in Python notebooks and then deploy them in Kibana. I didn't get close to a stage where this would be automated but at least I made some promising findings. The Altair visualization library can be used to create VEGA based visualization in notebooks. Altair can be used to export VEGA specs and those can then be used in Kibana using its VEGA plugin. Fetching the data needs manual adaption, but this way it's possible to deploy visualizations looking exactly the same from Python notebooks to Kibana dashboards.
A note on the notebooks: Not all of them render completely on Github, for example the Altair examples require being run in a notebook to render correctly.
- Demonstrates synced views using Altair.)
- Data Source World Cities Database by Simple Maps is licensed under CC BY 4.0.
- Basic visualizations based on the Datasaurus dataset. Includes different techniques to customize Seaborn scatterplots. Also demonstrates how to create interface elements to control visualizations with Altair.
- Based on Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing (Paper)
- Data Source: Dataset & Images (ZIP)
- Data based on the MIYO IoT Weather & Irrigation sensor. The notebooks demonstrate data manipulation of sparse source data to make it suitable for e.g. regression analysis.
- The miyo.ipynb notebook uses Seaborn for scatterplot matrices and SHAP to visualize feature influence.
- The miyo_altair.ipynb notebook uses Altair to replicate the scatterplots and additionally demonstrates how to visualize time series sensor data as small multiples.
seaborn_annotations/seaborn_annotations.ipynb
- Demonstrates how to redesign a standard seaborn boxplot into a more compelling version that draws the attention towards certain data points using highlighting and annotations. Based on that shows how to wrap a custom visualization in a helper function and create a reusable module out of it.
- Data Source: Spotify Charts Top 50 2018, License "Data files © Original Authors".
- shap_cars.ipynb demonstrates how to tweak a scatterplot matrix to only show the value distributions relevant for regression analysis, with one row for the target attributes and a chart for each attribute used for the analysis. Note the dataset itself could be transformed to get more interesting results, this was just used to document a very basic workflow to visualize source data with Seaborn, run regression analyis on it and visualize the results using the SHAP library.
- Dataset: Vehicle dataset from Cardekho, License unknown.
eland.ipynb demonstrates basic usage of eland to fetch data from Elasticsearch into a Pandas data frame and render it with Altair.
anomaly_detection/anomaly_detection.ipynb
anomaly_detection.ipynb replicates the Elasticsearch Machine Learning plugin's swimlanes using Altair in Python.