Visual Insights

Automatic insights extraction and visualization specification in data analysis.

Introduction

visual insights helps you extract insights from data warehouse automatically and generate visualization with interesting findings.

Now, there are three main modules in visual-insights application.

DataSource

dataSource board is for data uploading and defining fields type(dimensions, measures). In visual insights, we regard dimensions as independent variable and measures as dependent variable.

Notebook

Notebook is a board for user to know what happened in the automatic analysis process. It shows how decisions are made by the application and provide interactive interface to adjust some of the parameters and operators used by the algorithm.

Gallery

Gallery displays parts of the visualization with interesting findings. There are some settings here to adjust some of the visual elements in the chart.

Examples

DataSet: NASA - Kepler

Details of the test result can be accessed here

Usage

try online demo Here
use the whole application npm run ui.
or run yarn workspace frontend start for front end dev, run yarn work space backend dev for backend.

only use the algorithm package. (/packages/visual-insights)

npm i visual-insights --save

How does it work

The working process are visualized in notebook board in the application. *** Main process of the algorithm is shown in the notebook board. ***

Univariate summary

For the first step, visual-insights analyze all the fields in the dataset independently. It gets the fields' distributions and calculate its entropy. Besides, it will define a semantic type (quantitative, ordinal, temporal, nominal) for each field. More details of the field will be displayed when hover your mouse on the fields.

Then, it will find the fields with high entropy and try to reduce it by grouping the field (for example). Only dimensions participates this process.

Subspaces

In this step, visual insights search the combination of fields, and calculate the entropy of each measure with some aggregation operators.

Correlation

After one subspace is specified (try to click one row of subsapce in notebook), it will analyze the correlation of measures in the space.

Clustering

It helps you to cluster all the measures based on their correlation. It puts all the variables who are strongly related together to make a specific view (with specified dimenions).

Click one group of measures and visualization will be shown at the bottom of the page.

Specification & Visualization

specification

visualization.

Documentation

doc for core-algorithm: visual-insights
doc for react components: todos

Citation

T. Sellam, E. Müller and M. Kersten, "Semi-automated exploration of data warehouses," in 2015, . DOI: 10.1145/2806416.2806538.
Wongsuphasawat, Kanit, et al. "Voyager 2: Augmenting visual analysis with partial view specifications." Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2017.
B. Tang et al, "Extracting top-K insights from multi-dimensional data," in 2017, . DOI: 10.1145/3035918.3035922.
A. Satyanarayan, K. Wongsuphasawat and J. Heer, "Declarative interaction design for data visualization," in 2014, . DOI: 10.1145/2642918.2647360.
P. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining. (Pearson New International First ed.) Essex: Pearson, 2014.

HuoXiaoYe/Rath