Hands-on tutorials for basic visualization techniques and the necessary data processing
Information visualization is concerned with the visual and interactive representation of abstract and possibly complex datasets. As we encounter growing datasets in various sectors there is an increasing need to develop effective methods for making sense of data. Information visualization relies on computational means and our perceptual system to help reveal otherwise invisible patterns and gain new insights. Across various fields, there is great hope in the power of visualization to turn complex data into informative, engaging, and maybe even attractive forms. However, it typically takes several steps of data preparation and processing before a given dataset can be meaningfully visualized. While visualizations can indeed provide novel and useful perspectives on data, they can also obscure or misrepresent certain aspects of a phenomenon. Thus it is essential to develop a critical literacy towards the rhetoric of information visualization. One of the best ways to develop this literacy is to learn how to create visualizations! The tutorials offer a practical approach to working with data and to create interactive visualizations.
The tutorials require basic familiarity with statistics and programming. They come as Jupyter notebooks containing both human-readable explanations as well as computable code. The code blocks in the tutorials are written in Python, which you should either have already some experience with or a keen curiosity for. The tutorials make frequent use of the data analysis library Pandas, the visualization library Altair, and a range of other packages. You can view the tutorials as webpages, open and run them on Google Colab, or download the Jupyter notebook files to edit and run them locally.
The first three tutorials lay the groundwork, after which five common data structures are covered:
- Getting started Colab
- Data wrangling Colab
- Interaction techniques Colab
- Temporal analysis Colab
- Text processing Colab
- Many dimensions Colab
- Network analysis Colab
- Geovisualization Colab
The tutorials were created for the Information Visualization course at Fachhochschule Potsdam during the summer semester 2020. Many thanks to Fidel Thomet, Jonas Parnow et al. at UCLAB for feedback and fixes, and to the many generous creators of the various open source software packages used throughout the tutorials.
The notebooks are released under the Creative Commons Attribution license (CC BY 4.0).
If you encounter any errors or have any suggestions for improvement, feel free to send an email fork this repository and send a pull request.