/autoesda

A Python package that automates the exploratory spatial data analysis (ESDA) process by summarizing the results in an HTML report

Primary LanguageHTMLBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

autoESDA

image image Conda image image image Visitor Badge

A Python package that automates the exploratory spatial data analysis (ESDA) process by summarising the results into an HTML report.

Table of Contents

  1. Introduction
  2. Key features
  3. Installation
  4. Dependancies
  5. Usage
  6. Examples
  7. Contributing
  8. License
  9. References
  10. Credits

1. Introduction

Exploratory spatial data analysis (ESDA) is a term used to describe a various functions used to gain a surface-level understanding of a spatial dataset. Currently the ESDA process is repetitive as each of these functions need to be calculated individually. This makes it quite a time consuming process and also includes a large margin for human-induced errors. Additionally, results are not often easily viewed side-by-side for easy comparison and sharing with people who may not have the technical skills to do so.

autoesda is the solution to this by allowing the user to execute one line of code to generate an information-rich HTML report that can easily be shared with others.

2. Key features

  • HTML output report
  • Extent map
  • Dataset overview (coordinate system, number of rows/columns, which rows/columns have been included/excluded in the report)
  • Descriptive statistics (count, mean, standard deviation, minimum/maximum, 25th/50th/75th percentiles)
  • Sample of dataset
  • Boxplot
  • Histogram
  • Moran's I simulation (moran's I, number of features, p-value, z-score, number of permutations)
  • Local Indicator of Spatial Autocorrelation (local scatterplot, LISA cluster map)
  • Choropleth maps (quantiles, equal intervals, natural breaks, and percentiles classification schemes)
  • Correlation (correlation matrix/heatmap, pairwise plot)

3. Installation

autoesda is available on PyPI, to install autoesda, run this command in your terminal:

pip install autoesda

geopandas is a primary dependancy of autoesda and there are known challenges assosciated with using pip to install geopandas. The recommended strategy is thus, to use autoesda in a conda environment.

For advanced users, you can follow this documentation which will guide you through the geopandas installation by downloading the unofficial binary files of some of the geopandas dependancies.

autoesda is also available on conda-forge. If you have Anaconda or Miniconda installed on your computer you can use this command in your Anaconda/Miniconda prompt:

conda install autoesda

4. Dependancies

5. Usage

To start off with, you need to ensure that you have imported both geopandas and autoesda.

import geopandas as gpd
import autoesda

Once both libraries have been sucessfully imported, you can import your dataset as a GeoDataFrame. This is done using geopandas. To read more about compatible file types, read the geopandas documentation. In this example, a shapefile is imported.

gdf = gpd.read_file(r'example-file-path\example-shapefile.shp')

Once your data is stored in a GeoDataFrame, you can generate the report.

autoesda.generate_report(gdf)

The report will be saved to your working file directory.

6. Example Reports

Vector Reports Raster Reports
Old COJ Demographic Data Global Terrestrial Precipitation
Band 1 | Band 2 | Band 3 | Band 4 | Stacked
AirbBnB Chicago 2015 EU NOx Concentration
Band 1 | Band 2 | Band 3 | Band 4 | Stacked
Grid 100 South African Population
Band 1 | Band 2 | Band 3 | Band 4
South African 2011 Census
Natural Earth Country Boundaries
Malaria in Colombia
USA Election Results

7. Contributing

Click here to report bugs

Click here to request a new feature

If you would like to assist with fixing bugs, further development or writing documentation you are most welcome to do so. Use the issues page to guide what you can assist with.

In order to make a contribution you will need to:

  1. Fork the autoesda repository on GitHub.
  2. Clone your fork locally.
  3. Commit your changes to your branch on GitHub
  4. Once you are satsfied that your work is suitable, submit a pull request through the GitHub website.

8. License

This software is available under the BSD-3-Clause license.

For more information, see the LICENSE file which contains details on the history of this software, terms & conditions for usage, and a disclaimer of all warranties.

9. References

When citing this library, please reference the following:

de Kock, N., Rautenbach, V., and Fabris-Rotelli, I.: TOWARDS AN OPEN SOURCE PYTHON LIBRARY FOR AUTOMATED EXPLORATORY SPATIAL DATA ANALYSIS, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B4-2022, 91–98, https://doi.org/10.5194/isprs-archives-XLIII-B4-2022-91-2022, 2022.

10. Credits

This package was created with Cookiecutter and the giswqs/pypackage project template.