An ETL pipeline for scraping Google Play reviews for Sky: Children of the Light. I used Airflow for task scheduling, extracted the data using the google-play-scraper library, transformed it with pandas and loaded it into a local MySQL database.
Review Table
Column | Description |
---|---|
review_id | Google Play review ID |
user_name | Google username |
content | Google Play review |
rating | rating (1 - 5) |
thumbs_up_count | Number of users who found the review helpful |
version | Game version |
last_modified | Date on which the review was last modified |
Folder Structure
|--- skyscraper
| |-- modules
| | |-- ...
| |-- skyscraper.py (Airflow DAG definition file)
|
|--- sql
|-- create_sky_database.sql
|-- review_dump.sql (sql dump for reviews last modified between January 1st 2021 and May 23rd 2021)
References
Sky [Game]. (2020). Santa Monica (California): thatgamecompany.