- Set up Glue to provision and run the ETL pipeline, taking JSON format of Yelp data set that stores in S3 bucket to queryable format
- Create S3 Bucket to hold Athena query result
- Configure Glue Crawler to connect to data source
- write a series of queries in Athena to implement aggregate calculationlike the state rank then download the result in csv. format
Query example:
SELECT state, COUNT (*) as num_states FROM yelp GROUP BY state ORDER BY num_states DESC LIMIT 10;