/earth-dashboard-ds

Primary LanguageJupyter NotebookMIT LicenseMIT

eMaintainability Test Coverage

PlanetData.World

A website that teaches middle school students about the Earth and data visualization via interactive lessons.

DS Contributors

Charles Vanchieri Serina Grill Sean Hobin

MIT Python code style: prettier

Project Overview

Project Repositories

Tech Stack

Visualizations: Plotly, D3, Mapbox, Seaborn, Matplotlib

Services: AWS, Docker, Jupyter Notebooks, Postman

Languages: Python

Backend: AWS API Gateway, AWS Lambda, AWS RDS PostgreSQL, Flask, SQLAlchemy, Heroku, AWS CloudWatch

Predictive Modeling: Facebook Prophet, Random Forest Regressor

Getting Started

A note before you begin:

This application is primarily serverless. 10 packaged functions (AWS Lambda) are located on AWS.

  • 7 lambda functions are accessible via AWS API Gateway. These endpoints return a json string — data that has been formatted, filtered, and wrangled by the DS team (and in cases of dynamic data, placed into the PostgreSQL database).

  • The other 3 lambda functions, however, are not accessible via AWS API Gateway. They are functions that are triggered by AWS CloudWatch rules, updating existing tables in the database with new data from various external API data sources. Each day a CloudWatch rule triggers the 3 functions to parse the data from the external APIs, updating the summary (Global Cases visualization) table to get today’s data into the AWS RDS PostgreSQL. As the bubbles visualization rely on these tables, so too does the visualization which updates in order to show relevant data. This is all the result of these self-sufficient functions. Side note: You’ll notice a third table exists in the database (uscounties); this was originally meant to be a dynamic table but it proved too much for both Lambda functions and Heroku.

Why is there a Flask app, then, you ask, if this is all serverless? Why am I necessary?

  • There is 1 endpoint which could not be made serverless (but go ahead and try with other cloud services such as Google Cloud functions, for example). This exists in the Flask app, deployed to Heroku. The endpoint simply returns the data from the uscounties table in the database for web to visualize the heatmap.

Prerequisites

* Flask (preferred: Flask-SQLAlchemy, Flask-RestPlus, Flask-Marshmallow)
* SQL, especially for PostgreSQL
* Knowledge of how to run an application locally
* Heroku or another web server (if part of the build-on for this project)

Installing

Go ahead and clone this repository into the directory of your choosing. You'll need to put the Heroku Environment Variables into a .env file in your base directory.

To start up the app locally, navigate to the FLASK directory via the CLI and type

flask run

When viewing in your browser, it should result in this:

Image of Swagger API

Otherwise you may not have the right environment variables.

Running tests

As only 1 endpoint exists for the Flask API, only a few tests exist for this application, unfortunately. This area should be more robust in a later build.

To run, navigate to the application directory of the repository and type:

pytest test.py

Or from the FLASK directory of the repository you may type:

python -m application.test

These tests simply check the external APIs from which they request a response.

Deployment to Heroku

Create a new app on Heroku. Next, deployment of an application will require creating a special type of git remote called a Heroku remote (a Heroku-hosted remote). You can set this up in your remote repository on github by first logging in to heroku with

heroku login

Once you have logged in, type

heroku git:remote -a whatever_you_named_your_app

As the app cannot be run from the root directory of the repository, one MUST use

git subtree push --prefix FLASK heroku master

in order to let Heroku know where the application is, as it will be looking for the Pipfile. If you renamed your Heroku remote to something besides 'heroku,' replace 'heroku' in the command above with whatever you renamed it.

Data Sources

Landing Page Globe

COVID-19

Deforestation

Wildlife

Global Warming

API Documentation

Architecture

Backend deployed serverlessly through AWS API Gateway and AWS Lambda, with three endpoints existing on a Heroku server.

AWS API Gateway Endpoints

COVID-19 Global Cases Bubbles Visualization Data (AWS API Gateway and AWS Lambda)

URL

https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/summary_db_query

Description

Returns the name and total confirmed cases for each country.

Schema

{
	"country": string,
	"totalConfirmed": number
}

COVID-19 Global Cases Bubbles Visualization - Refresh Data (AWS Lambda and AWS Cloudwatch)

URL

https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/summary_db_add

Description

Pulls data from covid/summary API and inserts it into the AWS RDS PostgreSQL. Triggered once a day by a AWS CloudWatch rule.

COVID-19 Global Fatalities Racing Chart Data (AWS API Gateway and AWS Lambda)

URL

https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/covidall_db_query

Description

Returns the country, date, and cumulative number of deaths from COVID-19.

Schema

{
	"country": string,
	"date": string ("yyyy/MM/dd"),
	"deaths": number
}

Air Quality Line Graph Data (AWS API Gateway and AWS Lambda)

URL

https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/airquality_query

Description

Returns a set of all dates, the date and daily dean PM2.5 concentration for each day, and the date and number of cases for each day. Data is only used for dates shared between both the cases and air quality data.

Schema

{
	"dates": string ("M/d/yyyy")[],
	"airQuality": {
		"x": string ("M/d/yyyy"),
		"y": number
		}[],
	"cases": {
		"x": string ("M/d/yyyy"),
		"y": number
		}[]
}

Deforestation Prediction Trends Line Graph Data (AWS API Gateway and AWS Lambda)

URL

https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/deforestation_function

Description

Returns the country code, year, agricultural land in sq. km, electrical power consumption, GDP per capita growth, livestock production index, number of ores and metals exports, urban population, crop production index, food production index and forest area percentage for each country.

Schema

{
	"Country Name": string,
  	"Country Code": string,
  	"Year": number,
  	"Agricultural land (sq. km)": number,
  	"Electric power consumption (kWh per capita)": number,
  	"GDP per capita growth (annual %)": number,
 	"Livestock production index (2004-2006 = 100)": number,
  	"Ores and metals exports (% of merchandise exports)": number,
 	"Urban population": number,
  	"Crop production index (2004-2006 = 100)": number,
  	"Food production index (2004-2006 = 100)": number,
  	"Forest area (% of land area)": number
}

Globe - Carbon Footprint Data (AWS API Gateway and AWS Lambda)

URL

https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/globe_footprint

Description

Returns the name, latitude, longitude, and carbon footprint of the city.

Schema

[
	[city name, lat, lon, magnitude, city name, lat, lon, magnitude, city name, lat, lon, magnitude.. ]
]

Bird Migration Ridgeplot Data (AWS API Gateway and AWS Lambda)

URL

https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/migration_density

Description

Returns the number of bird sightings for that species in 1970, 1975, 1981, 1985, 1990, 1998, 2004, 2011, and 2015.

Schema

{
	"1970": number,
	"1975": number,
	"1981": number,
	"1985": number,
	"1990": number,
	"1998": number,
	"2004": number,
	"2011": number,
	"2015": number
}

AWS Environment Variables

In order to re-create the AWS Lambda functions correctly, the user must set up their own environment variables in each AWS Lambda function.

RDS_HOST = database url
RDS_USERNAME = username
RDS_USER_PWD = password

In addition, create a Dockerfile based on the Amazon Linux image to create the correct Python environment (we used 3.7). Refer to this article for help if need be.

Heroku Endpoint

COVID-19 US Cases Heatmap Data

URL

https://ds-backend-planetdata.herokuapp.com/covid/uscounties/query

Description

Returns the latitude, longitude, number of confirmed cases, and date for each day and a set of all dates.

Schema

{
	"cases": {
		"lat": number,
		"lon": number,
		"cases": number,
		"date": string ("MM/dd/yy")
		},
	"dates": string ("MM/dd/yy")
}

Heroku Environment Variables

In order for the Flask app to function correctly, the user must set up their own environment variables.

create a .env file that includes the following:

SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://username:password@databaseurl'
TESTING=True
DEBUG=True
SQLALCHEMY_TRACK_MODIFICATIONS = False
SQLALCHEMY_ECHO=True
FLASK_APP=application.py
FLASK_ENV=development

Initial Database Migration

You can quickly add a new table to the database and insert a large amount of data via Jupyter notebook in Colab.

Data Migration Notebook