py2mappr: data visualization
Py2Mappr is a python library to generate openmappr player from node and links files.
It provides a simple API to generate a map from a set of nodes and links. The API is designed to be flexible and extensible. It is also designed to be used in a pipeline. The output of the library is a self contained folder with all the resources necessary to render a map.
Installing
At the moment the library is not available on pypi. There are two ways to install the library:
-
Install from the GitHub repository
pip install git+https://github.com/vibrant-data-labs/py2mappr
-
Clone the repository and install as a editable package
git clone <py2mappr-repo> pip install -r _requirements.txt cd <project> pip install -e <path-to-py2mappr>
Dependencies
Supported Python Versions
- Python 3.8+
External Dependencies
- numpy
- pandas
- tag2network
- boto3 (optional) - for uploading to s3, interacting with cloudfront distributions
- requests (optional) - for interacting with cloudflare API
Getting started
Py2Mappr accepts the datapoints and network data in the form of pandas.DataFrame
import pandas as pd
from py2mappr import mappr
# read the data
nodes = pd.read_csv("nodes.csv")
links = pd.read_csv("links.csv")
# create the map
mappr.create_map(nodes, links)
# display player in the browser
mappr.show()
The example above is the minimum required to generate a map. All other examples are available in the examples
folder.
Once the map is generated, the output folder will contain all the necessary files which can be served locally using the run_local.sh
script. The script is available in the data_out
folder. To serve the map locally navigate to the data_out
folder in a terminal shell and run the provided utility:
./run_local.sh
or, you can simply run a python server at a desired port:
python -m http.server <PORT_NUM>
data_out
folder
The This folder is generated by py2mappr. Following are the contents:
index.html # entry point. has references to the openmappr player resources (js/css/images etc)
data/
nodes.json # node data with attributes metadata
links.json # links data with attributes metadata
settings.json # global settings for the player
run_local.sh # simple utility to run a local server
API
For more information about the project and layout configuration, please refer to Layouts Configuration, Project Configuration and Attributes Configuration.
py2mappr contains 2 main modules:
- mappr
- publish
mappr
This module contains the main API to generate a map. In order to import the module, use the following:
from py2mappr import mappr
It also contains the following helper methods:
create_map(..)
: creates a map from the given datacreate_layout(..)
: creates a layout object and attaches to the projectshow()
: opens the map in the browser
create_map(..)
This method is the main method to generate a map. It accepts the following arguments:
- data_frame:
DataFrame
, required. The data frame of datapoints with its attributes (columns) to be used in the project. - network_df:
DataFrame
, optional. The data frame of edges with its attributes (columns) to be used in the project. Noting that thenetwork_df
is optional in this method, but it is expected to be set for the project, which can be done usingset_network(..)
. - layout_type:
str
, optional. The type of the layout to be created as the first layout in the project. The default is "clustered". Available values are:clustered
,scatterplot
,clustered-scatterplot
,geo
.
create_layout(..)
This method is used to create a layout object and attach it to the project. It accepts the following arguments:
-
data_frame:
DataFrame
, optional. The data frame of datapoints with its attributes (columns) to be used in the project. Should be provided if there is no current project. -
layout_type:
PLOT_TYPE
, optional. The type of the layout to be created. The default is "clustered". The available types are:clustered
scatterplot
,clustered-scatterplot
,geo
.
publish
This module contains the API to publish the map to the local system or cloud storage. In order to import the module, use the following:
import py2mappr.publish as publisher
It provides the API to run the publishing tasks in predefined order.
Run on local system
publisher.run([
publisher.local()
])
The call above will build the current project and pass the current project path to the local
task. The local
task will start the server using http.server
and open the map in the browser.
Publish on S3
publisher.run([
publisher.s3("my-bucket-name")
])
The call above will build the current project and pass the current project path to the s3
task. The s3
task will fetch the AWS credentials from the config.ini
file of the current python project and upload the project to the given bucket.
Publish on S3 and set up CloudFront distribution
publisher.run([
publisher.s3("my-bucket-name"),
publisher.cloudfront("cdn.mydomain.com")
])
The call above will build the current project and pass the current project path to the s3
task. The s3
task will fetch the AWS credentials from the config.ini
file of the current python project and upload the project to the given bucket. The bucket name will be passed to the cloudfront
tasks, which will create a new distribution for the given bucket with an alias of cdn.mydomain.com
.
Expanding the publish flow
Custom tasks can be added to the publish flow. The custom task should be a function that accepts the dict of values returned by the previous task. The following is an example of a custom task:
def my_custom_task(data):
print("my custom task")
print("project path: ", data.get("web_dir"))
publisher.run([
my_custom_task
])
The following values can be accessed from the data
dict:
web_dir
: the path to the project folderbucket
: the name of the s3 bucket, returned from thes3
taskcdn_url
: the url of the cloudfront distribution, returned from thecloudfront
task