/py2mappr

generate openmappr interactive network visualization from python

Primary LanguagePython

py2mappr: data visualization

Py2Mappr is a python library to generate openmappr player from node and links files.

It provides a simple API to generate a map from a set of nodes and links. The API is designed to be flexible and extensible. It is also designed to be used in a pipeline. The output of the library is a self contained folder with all the resources necessary to render a map.

Installing

At the moment the library is not available on pypi. There are two ways to install the library:

  1. Install from the GitHub repository

     pip install git+https://github.com/vibrant-data-labs/py2mappr
    
  2. Clone the repository and install as a editable package

     git clone <py2mappr-repo>
     pip install -r _requirements.txt
     cd <project>
     pip install -e <path-to-py2mappr>
    

Dependencies

Supported Python Versions

  • Python 3.8+

External Dependencies

  • numpy
  • pandas
  • tag2network
  • boto3 (optional) - for uploading to s3, interacting with cloudfront distributions
  • requests (optional) - for interacting with cloudflare API

Getting started

Py2Mappr accepts the datapoints and network data in the form of pandas.DataFrame

    import pandas as pd
    from py2mappr import mappr

    # read the data
    nodes = pd.read_csv("nodes.csv")
    links = pd.read_csv("links.csv")

    # create the map
    mappr.create_map(nodes, links)

    # display player in the browser
    mappr.show()

The example above is the minimum required to generate a map. All other examples are available in the examples folder.

Once the map is generated, the output folder will contain all the necessary files which can be served locally using the run_local.sh script. The script is available in the data_out folder. To serve the map locally navigate to the data_out folder in a terminal shell and run the provided utility:

./run_local.sh

or, you can simply run a python server at a desired port:

python -m http.server <PORT_NUM>

The data_out folder

This folder is generated by py2mappr. Following are the contents:

index.html              # entry point. has references to the openmappr player resources (js/css/images etc)
data/
    nodes.json          # node data with attributes metadata
    links.json          # links data with attributes metadata
    settings.json       # global settings for the player
run_local.sh            # simple utility to run a local server

API

For more information about the project and layout configuration, please refer to Layouts Configuration, Project Configuration and Attributes Configuration.

py2mappr contains 2 main modules:

  • mappr
  • publish

mappr

This module contains the main API to generate a map. In order to import the module, use the following:

from py2mappr import mappr

It also contains the following helper methods:

  • create_map(..): creates a map from the given data
  • create_layout(..): creates a layout object and attaches to the project
  • show(): opens the map in the browser

create_map(..)

This method is the main method to generate a map. It accepts the following arguments:

  • data_frame: DataFrame, required. The data frame of datapoints with its attributes (columns) to be used in the project.
  • network_df: DataFrame, optional. The data frame of edges with its attributes (columns) to be used in the project. Noting that the network_df is optional in this method, but it is expected to be set for the project, which can be done using set_network(..).
  • layout_type: str, optional. The type of the layout to be created as the first layout in the project. The default is "clustered". Available values are: clustered, scatterplot, clustered-scatterplot, geo.

create_layout(..)

This method is used to create a layout object and attach it to the project. It accepts the following arguments:

  • data_frame: DataFrame, optional. The data frame of datapoints with its attributes (columns) to be used in the project. Should be provided if there is no current project.

  • layout_type: PLOT_TYPE, optional. The type of the layout to be created. The default is "clustered". The available types are: clustered scatterplot, clustered-scatterplot, geo.

publish

This module contains the API to publish the map to the local system or cloud storage. In order to import the module, use the following:

import py2mappr.publish as publisher

It provides the API to run the publishing tasks in predefined order.

Run on local system

    publisher.run([
        publisher.local()
    ])

The call above will build the current project and pass the current project path to the local task. The local task will start the server using http.server and open the map in the browser.

Publish on S3

    publisher.run([
        publisher.s3("my-bucket-name")
    ])

The call above will build the current project and pass the current project path to the s3 task. The s3 task will fetch the AWS credentials from the config.ini file of the current python project and upload the project to the given bucket.

Publish on S3 and set up CloudFront distribution

    publisher.run([
        publisher.s3("my-bucket-name"),
        publisher.cloudfront("cdn.mydomain.com")
    ])

The call above will build the current project and pass the current project path to the s3 task. The s3 task will fetch the AWS credentials from the config.ini file of the current python project and upload the project to the given bucket. The bucket name will be passed to the cloudfront tasks, which will create a new distribution for the given bucket with an alias of cdn.mydomain.com.

Expanding the publish flow

Custom tasks can be added to the publish flow. The custom task should be a function that accepts the dict of values returned by the previous task. The following is an example of a custom task:

    def my_custom_task(data):
        print("my custom task")
        print("project path: ", data.get("web_dir"))

    publisher.run([
        my_custom_task
    ])

The following values can be accessed from the data dict:

  • web_dir: the path to the project folder
  • bucket: the name of the s3 bucket, returned from the s3 task
  • cdn_url: the url of the cloudfront distribution, returned from the cloudfront task