This project shows how to use Amazon Athena, AWS AppSync, and AWS Amplify to build an application that interacts with big data. The application is built using React, the AWS Amplify Javascript library, and the D3.js Javascript library to render custom visualizations.
The application code can be found in this GitHub repository. It uses Athena to query data hosted in a public Amazon S3 bucket by the Registry of Open Data on AWS. Specifically, it uses the High Resolution Population Density Maps + Demographic Estimates by CIESIN and Facebook.
This public dataset provides "population data for a selection of countries, allocated to 1 arcsecond blocks and provided in a combination of CSV and Cloud-optimized GeoTIFF files," and is hosted in the S3 bucket s3://dataforgood-fb-data.
- 03/30/2020: Use envCache to define unique bucket per build environment
- 03/21/2020: Update layout with tailwindcss.
- 12/19/2019: New branch
with-auth
provides an implementation that tracks request per owner in a DynamoDB table. In introduces a pipeline resolver with 2 functions to interact with Amazon Athena and AWS Lambda in a single query. Using a@dynamodb
and@auth
transformer, access to past queries is limited to the query owners and the Lambda functions that create and update the table entries. To view the intial release (minus the auth), go to the initial-release branch.type AthenaOperation @model @key( name: "ByOwner" fields: ["owner", "createdAt"] queryField: "queryByOwner" ) @auth( rules: [ { allow: owner, identityClaim: "sub" } { allow: private, provider: iam, operations: [create, update] } ] ) { id: ID! queryString: String! countryCode: String! status: STATUS! owner: String file: S3Object createdAt: String }
- 11/11/19: Updated to support multi auth with Amplify CLI. Requires Amplify CLI version 3.17 and above.
For more information about this app and how to get started, please see the blog post.
Deploy the application in a single step to the Amplify Console by clicking the button above.
Alternatively, you can clone the repository, deploy the backend with Amplify CLI, and build and serve the frontend locally.
First, install the Amplify CLI and step through the configuration.
npm install -g @aws-amplify/cli
amplify configure
Next, clone the repository and install the dependencies.
git clone https://github.com/aws-samples/aws-appsync-visualization-with-athena-app
cd aws-appsync-visualization-with-athena-app
yarn
Update the name of the storage bucket (bucketName
) in the file ./amplify/backend/storage/sQueryResults/parameters.json
then initialize a new Amplify project and push the changes.
amplify init
amplify push
Finally, launch the application.
yarn start
Here is how the application works:
- Users sign in to the app using Amazon Cognito User Pools. The JWT access token returned at sign-in is sent in an authorization header to AWS AppSync with every GraphQL operation.
- A user selects a country from the drop-down list and chooses Query. This triggers a GraphQL query. When the app receives the
QueryExecutionId
in the response, it subscribes to mutations on that ID. - Pipeline resolver:
- AWS AppSync makes a SigV4-signed request to the Athena API with the specified query.
- The results of the query request are sent to a Lambda function.
- Lambda function stores the operation details for the owner in a DynamoDB table using AWS AppSync.
- Athena runs the query against the specified table. The query returns the sum of the population at recorded longitudes for the selected country along with a count of latitudes at each longitude.
SELECT longitude, count(latitude) as count, sum(population) as tot_pop FROM "default"."hrsl" WHERE country='${countryCode.trim()}' group by longitude order by longitude
- The results of the query are stored in the result S3 bucket, under the
/protected/athena/
prefix. Signed-in app users can access these results using their IAM credentials. - Putting the query result file in the bucket generates an S3 event and triggers the announcer Lambda function.
- The announcer Lambda function sends an
announceQueryResult
mutation with the S3 bucket and object information. - The mutation triggers a subscription with the mutation's selection set.
- The client retrieves the result file from the S3 bucket and displays the custom visualization.