PyGWalker Cloud is released! You can now save your charts to cloud, publish the interactive cell as a web app and use advanced GPT-powered features. Check out the PyGWalker Cloud for more details.

PyGWalker: A Python Library for Exploratory Data Analysis with Visualization

PyGWalker can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a Tableau-style User Interface for visual exploration.

PyGWalker (pronounced like "Pig Walker", just for fun) is named as an abbreviation of "Python binding of Graphic Walker". It integrates Jupyter Notebook (or other jupyter-based notebooks) with Graphic Walker, a different type of open-source alternative to Tableau. It allows data scientists to analyze data and visualize patterns with simple drag-and-drop operations.

Visit Google Colab, Kaggle Code or Graphic Walker Online Demo to test it out!

If you prefer using R, you can check out GWalkR now!

Getting Started

Run in Kaggle	Run in Colab

Setup pygwalker

Before using pygwalker, make sure to install the packages through the command line using pip or conda.

pip

pip install pygwalker

Note

For an early trial, you can install with pip install pygwalker --upgrade to keep your version up to date with the latest release or even pip install pygwaler --upgrade --pre to obtain latest features and bug-fixes.

Conda-forge

conda install -c conda-forge pygwalker

mamba install -c conda-forge pygwalker

See conda-forge feedstock for more help.

Use pygwalker in Jupyter Notebook

Quick Start

Import pygwalker and pandas to your Jupyter Notebook to get started.

import pandas as pd
import pygwalker as pyg

You can use pygwalker without breaking your existing workflow. For example, you can call up PyGWalker with the dataframe loaded in this way:

df = pd.read_csv('./bike_sharing_dc.csv')
walker = pyg.walk(df)

That's it. Now you have a interactive UI to analyze and visualize data with simple drag-and-drop operations.

Cool things you can do with PyGwalker:

You can change the mark type into others to make different charts, for example, a line chart:
To compare different measures, you can create a concat view by adding more than one measure into rows/columns.
To make a facet view of several subviews divided by the value in dimension, put dimensions into rows or columns to make a facets view. The rules are similar to Tableau.
You can view the data frame in a table and configure the analytic types and semantic types.
You can save the data exploration result to a local file

For more detailed instructions, visit the Graphic Walker GitHub page.

Better Practice

There are some important parameters you should know when using pygwalker:

spec: for save/load chart config (json string or file path)
use_kernel_calc: for using duckdb as computing engine which allows you to handle larger dataset faster in your local machine.

df = pd.read_csv('./bike_sharing_dc.csv')
walker = pyg.walk(
    df,
    spec="./chart_meta_0.json",    # this json file will save your chart state, you need to click save button in ui mannual when you finish a chart, 'autosave' will be supported in the future.
    use_kernel_calc=True,          # set `use_kernel_calc=True`, pygwalker will use duckdb as computing engine, it support you explore bigger dataset(<=100GB).
)

Example in local notebook

Notebook Code: Click Here
Preview Notebook Html: Click Here

Example in cloud notebook

Use pygwalker in Streamlit

Streamlit allows you to host a web version of pygwalker without figuring out details of how web application works.

Here are some of the app examples build with pygwalker and streamlit:

import pandas as pd
import streamlit.components.v1 as components
import streamlit as st
from pygwalker.api.streamlit import init_streamlit_comm, get_streamlit_html

st.set_page_config(
    page_title="Use Pygwalker In Streamlit",
    layout="wide"
)

st.title("Use Pygwalker In Streamlit(support communication)")

# Initialize pygwalker communication
init_streamlit_comm()

# When using `use_kernel_calc=True`, you should cache your pygwalker html, if you don't want your memory to explode
@st.cache_resource
def get_pyg_html(df: pd.DataFrame) -> str:
    # When you need to publish your application, you need set `debug=False`,prevent other users to write your config file.
    # If you want to use feature of saving chart config, set `debug=True`
    html = get_streamlit_html(df, spec="./gw0.json", use_kernel_calc=True, debug=False)
    return html

@st.cache_data
def get_df() -> pd.DataFrame:
    return pd.read_csv("/bike_sharing_dc.csv")

df = get_df()

components.html(get_pyg_html(df), width=1300, height=1000, scrolling=True)

API Reference

pygwalker.walk

Parameter	Type	Default	Description
dataset	Union[DataFrame, Connector]	-	The dataframe or connector to be used.
gid	Union[int, str]	None	ID for the GraphicWalker container div, formatted as 'gwalker-{gid}'.
env	Literal['Jupyter', 'Streamlit', 'JupyterWidget']	'JupyterWidget'	Environment using pygwalker.
fieldSpecs	Optional[Dict[str, FieldSpec]]	None	Specifications of fields. Will be automatically inferred from `dataset` if not specified.
hideDataSourceConfig	bool	True	If True, hides DataSource import and export button.
themeKey	Literal['vega', 'g2']	'g2'	Theme type for the GraphicWalker.
dark	Literal['media', 'light', 'dark']	'media'	Theme setting. 'media' will auto-detect the OS theme.
return_html	bool	False	If True, returns the result as an HTML string.
spec	str	""	Chart configuration data. Can be a configuration ID, JSON, or remote file URL.
use_preview	bool	True	If True, uses the preview function.
store_chart_data	bool	False	If True and `spec` is a JSON file, saves the chart to disk.
use_kernel_calc	bool	False	If True, uses kernel computation for data.
**kwargs	Any	-	Additional keyword arguments.

Tested Environments

Configuration And Privacy Policy(pygwlaker >= 0.3.10)

$ pygwalker config --help

usage: pygwalker config [-h] [--set [key=value ...]] [--reset [key ...]] [--reset-all] [--list]

Modify configuration file. (default: /Users/douding/Library/Application Support/pygwalker/config.json) 
Available configurations:

- privacy  ['offline', 'update-only', 'events'] (default: events).
    "offline": fully offline, no data is send or api is requested
    "update-only": only check whether this is a new version of pygwalker to update
    "events": share which events about which feature is used in pygwalker, it only contains events data about which feature you arrive for product optimization. No DATA YOU ANALYSIS IS SEND.
    
- kanaries_token  ['your kanaries token'] (default: empty string).
    your kanaries token, you can get it from https://kanaries.net.
    refer: https://space.kanaries.net/t/how-to-get-api-key-of-kanaries.
    by kanaries token, you can use kanaries service in pygwalker, such as share chart, share config.
    

options:
  -h, --help            show this help message and exit
  --set [key=value ...]
                        Set configuration. e.g. "pygwalker config --set privacy=update-only"
  --reset [key ...]     Reset user configuration and use default values instead. e.g. "pygwalker config --reset privacy"
  --reset-all           Reset all user configuration and use default values instead. e.g. "pygwalker config --reset-all"
  --list                List current used configuration.

More details, refer it: How to set your privacy configuration?

License

Apache License 2.0

Resources

Check out more resources about Graphic Walker on Graphic Walker GitHub
We are also working on RATH: an Open Source, Automate exploratory data analysis software that redefines the workflow of data wrangling, exploration and visualization with AI-powered automation. Check out the Kanaries website and RATH GitHub for more!
Use pygwalker to build visual analysis app in streamlit
If you encounter any issues and need support, join our Slack or Discord channels.
Share pygwalker on these social media platforms if you like it!

Elllllllvin/pygwalker