GeoVisual analytics, abbr. GeoViz, is the science of analytical reasoning assisted by GeoVisual map interfaces.
-
Phase I: Spatial Data Preparation: In this phase, the system first loads the designated spatial data from the database (e.g., Shape files, PostGIS, HDFS). Based on the application, the system may then need to perform a data processing operation (e.g., spatial range query, spatial join) on the loaded spatial data to return the set of spatial objects to be visualized.
-
Phase II: Map Visualization (MapViz): In this phase, the system applies the map visualization effect, e.g., Heatmap, on the spatial objects produced in Phase I. The system first pixelizes the spatial objects, calculates the intensity of each pixel, and finally renders a geospatial map tile(s).
Babylon is a full-fledged system that allows the user to load, prepare, integrate and execute GeoViz tasks on spatial data at scale. Most importantly, Babylon co-optimizes the spatial query operators (e.g., spatial join) and MapViz operators (e.g., pixelization) side by side. This prototype is implemented in Apache Spark, SparkSQL.
To combine both the map visualization and data preparation phases, Babylon allows users to declaratively define a geospatial visual analytics (GeoViz) task using a SQL-like language.
Babylon encapsulates the main steps of the geospatial map visualization process, e.g., pixelize spatial objects, aggregate overlapped pixels, render map tiles, into a set of massively parallelized Map visualization operators. Such MapViz operators provide out-of-the-box support for the user to declaratively generate a variety of map visualization effects, e.g., scatter plot, heat map.
Babylon combines spatial query operators (from existings systems, such as GeoSpark or others) with MapViz operators to assemble an execution plan. Currently, Babylon supports spatial range query and spatial join query operators.
Babylon employs a partitioner operator that fragments a given pixel dataset across the cluster. The partitioner accommodates map visual constraints and also balances the load among the cluster nodes when processing skewed geospatial data. Therefore, Babylon can easily render a map image tile using pixels on the same data partition.
The optimizer takes as input a GeoViz query and figures out an execution plan that co-optimizes the map visualization operators and spatial query operators (i.e., used for data preparation).
- Git clone
git clone https://github.com/DataSystemsLab/Babylon.git
- Compile
mvn clean install -DskipTests
- Play with example queries: GeoViz query example
We are still working on the stable release of Babylon GeoViz query optimizer with SparkSQL support. But, for now, you still can manually assemble an optimized execution plan which fits in your scenario. In the example mentioned above, we provide two optimized GeoViz query:
- Optimized GeoViz query 1: Visualize multiple range queries, each with an arbitrary query window, to multiple maps
- Optimized GeoViz query 2: Visualize spatial join query to a map
-
Jia Yu (Email: jiayu2 at asu.edu)
-
Mohamed Sarwat (Email: msarwat at asu.edu)
Babylon is one of the projects initiated by Data Systems Lab at Arizona State University. The mission of Data Systems Lab is designing and developing experimental data management systems (e.g., database systems).