/skyscraper

Scrape Google Play Reviews for Sky: Children of the Light.

Primary LanguagePython

SkyScraper

An ETL pipeline for scraping Google Play reviews for Sky: Children of the Light. I used Airflow for task scheduling, extracted the data using the google-play-scraper library, transformed it with pandas and loaded it into a local MySQL database.

Review Table

Column Description
review_id Google Play review ID
user_name Google username
content Google Play review
rating rating (1 - 5)
thumbs_up_count Number of users who found the review helpful
version Game version
last_modified Date on which the review was last modified

Folder Structure

  |--- skyscraper
  |    |-- modules
  |    |   |-- ... 
  |    |-- skyscraper.py (Airflow DAG definition file)
  |
  |--- sql
       |-- create_sky_database.sql 
       |-- review_dump.sql (sql dump for reviews last modified between January 1st 2021 and May 23rd 2021)

References
Sky [Game]. (2020). Santa Monica (California): thatgamecompany.