A website that teaches middle school students about the Earth and data visualization via interactive lessons.
Charles Vanchieri | Serina Grill | Sean Hobin |
---|---|---|
Visualizations: Plotly, D3, Mapbox, Seaborn, Matplotlib
Services: AWS, Docker, Jupyter Notebooks, Postman
Languages: Python
Backend: AWS API Gateway, AWS Lambda, AWS RDS PostgreSQL, Flask, SQLAlchemy, Heroku, AWS CloudWatch
Predictive Modeling: Facebook Prophet, Random Forest Regressor
This application is primarily serverless. 10 packaged functions (AWS Lambda) are located on AWS.
-
7 lambda functions are accessible via AWS API Gateway. These endpoints return a json string — data that has been formatted, filtered, and wrangled by the DS team (and in cases of dynamic data, placed into the PostgreSQL database).
-
The other 3 lambda functions, however, are not accessible via AWS API Gateway. They are functions that are triggered by AWS CloudWatch rules, updating existing tables in the database with new data from various external API data sources. Each day a CloudWatch rule triggers the 3 functions to parse the data from the external APIs, updating the summary (Global Cases visualization) table to get today’s data into the AWS RDS PostgreSQL. As the bubbles visualization rely on these tables, so too does the visualization which updates in order to show relevant data. This is all the result of these self-sufficient functions. Side note: You’ll notice a third table exists in the database (uscounties); this was originally meant to be a dynamic table but it proved too much for both Lambda functions and Heroku.
Why is there a Flask app, then, you ask, if this is all serverless? Why am I necessary?
- There is 1 endpoint which could not be made serverless (but go ahead and try with other cloud services such as Google Cloud functions, for example). This exists in the Flask app, deployed to Heroku. The endpoint simply returns the data from the uscounties table in the database for web to visualize the heatmap.
* Flask (preferred: Flask-SQLAlchemy, Flask-RestPlus, Flask-Marshmallow)
* SQL, especially for PostgreSQL
* Knowledge of how to run an application locally
* Heroku or another web server (if part of the build-on for this project)
Go ahead and clone this repository into the directory of your choosing. You'll need to put the Heroku Environment Variables into a .env file in your base directory.
To start up the app locally, navigate to the FLASK directory via the CLI and type
flask run
When viewing in your browser, it should result in this:
Otherwise you may not have the right environment variables.
As only 1 endpoint exists for the Flask API, only a few tests exist for this application, unfortunately. This area should be more robust in a later build.
To run, navigate to the application directory of the repository and type:
pytest test.py
Or from the FLASK directory of the repository you may type:
python -m application.test
These tests simply check the external APIs from which they request a response.
Create a new app on Heroku. Next, deployment of an application will require creating a special type of git remote called a Heroku remote (a Heroku-hosted remote). You can set this up in your remote repository on github by first logging in to heroku with
heroku login
Once you have logged in, type
heroku git:remote -a whatever_you_named_your_app
As the app cannot be run from the root directory of the repository, one MUST use
git subtree push --prefix FLASK heroku master
in order to let Heroku know where the application is, as it will be looking for the Pipfile. If you renamed your Heroku remote to something besides 'heroku,' replace 'heroku' in the command above with whatever you renamed it.
Backend deployed serverlessly through AWS API Gateway and AWS Lambda, with three endpoints existing on a Heroku server.
https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/summary_db_query
Returns the name and total confirmed cases for each country.
{
"country": string,
"totalConfirmed": number
}
https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/summary_db_add
Pulls data from covid/summary API and inserts it into the AWS RDS PostgreSQL. Triggered once a day by a AWS CloudWatch rule.
https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/covidall_db_query
Returns the country, date, and cumulative number of deaths from COVID-19.
{
"country": string,
"date": string ("yyyy/MM/dd"),
"deaths": number
}
https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/airquality_query
Returns a set of all dates, the date and daily dean PM2.5 concentration for each day, and the date and number of cases for each day. Data is only used for dates shared between both the cases and air quality data.
{
"dates": string ("M/d/yyyy")[],
"airQuality": {
"x": string ("M/d/yyyy"),
"y": number
}[],
"cases": {
"x": string ("M/d/yyyy"),
"y": number
}[]
}
https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/deforestation_function
Returns the country code, year, agricultural land in sq. km, electrical power consumption, GDP per capita growth, livestock production index, number of ores and metals exports, urban population, crop production index, food production index and forest area percentage for each country.
{
"Country Name": string,
"Country Code": string,
"Year": number,
"Agricultural land (sq. km)": number,
"Electric power consumption (kWh per capita)": number,
"GDP per capita growth (annual %)": number,
"Livestock production index (2004-2006 = 100)": number,
"Ores and metals exports (% of merchandise exports)": number,
"Urban population": number,
"Crop production index (2004-2006 = 100)": number,
"Food production index (2004-2006 = 100)": number,
"Forest area (% of land area)": number
}
https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/globe_footprint
Returns the name, latitude, longitude, and carbon footprint of the city.
[
[city name, lat, lon, magnitude, city name, lat, lon, magnitude, city name, lat, lon, magnitude.. ]
]
https://4eo1w5jvy0.execute-api.us-east-1.amazonaws.com/default/migration_density
Returns the number of bird sightings for that species in 1970, 1975, 1981, 1985, 1990, 1998, 2004, 2011, and 2015.
{
"1970": number,
"1975": number,
"1981": number,
"1985": number,
"1990": number,
"1998": number,
"2004": number,
"2011": number,
"2015": number
}
In order to re-create the AWS Lambda functions correctly, the user must set up their own environment variables in each AWS Lambda function.
RDS_HOST = database url
RDS_USERNAME = username
RDS_USER_PWD = password
In addition, create a Dockerfile based on the Amazon Linux image to create the correct Python environment (we used 3.7). Refer to this article for help if need be.
https://ds-backend-planetdata.herokuapp.com/covid/uscounties/query
Returns the latitude, longitude, number of confirmed cases, and date for each day and a set of all dates.
{
"cases": {
"lat": number,
"lon": number,
"cases": number,
"date": string ("MM/dd/yy")
},
"dates": string ("MM/dd/yy")
}
In order for the Flask app to function correctly, the user must set up their own environment variables.
create a .env file that includes the following:
SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://username:password@databaseurl'
TESTING=True
DEBUG=True
SQLALCHEMY_TRACK_MODIFICATIONS = False
SQLALCHEMY_ECHO=True
FLASK_APP=application.py
FLASK_ENV=development
You can quickly add a new table to the database and insert a large amount of data via Jupyter notebook in Colab.