/CoronavirusLifestyleImpacts

This is the final project for Data 515 at the University of Washington, Spring 2020

Primary LanguagePythonMIT LicenseMIT

Coronavirus Lifestyle Impacts Project

Build Status Coverage Status

This is the final project for Data 515 at the University of Washington, Spring 2020.

This project is a command line tool designed to help users view how lifestyles have been influenced by the 2019 Novel Coronavirus using Google Trends data. The tool allows people to input geographical locations and keywords to have visualizations and summary statistics generated to summarize the impact. Running the project produces a aggregated .csv of data as well as a visualization to aide users in understanding the requested data.

How to run

At current repository root, run the following script:

./coronavirus_lifestyle_impacts.py [-h] [-s STATE] [-k KEYWORDS]

Example:

./coronavirus_lifestyle_impacts.py --state Washington --keyword "Bars near me, Home workouts"

A detailed example of this project can be found here.

Table of contents

  • /coronavirus_lifestyle_impacts: all python code and unit tests
    • coronavirus-lifestyle-impacts.py: Main script that takes state and keyword inputs and outputs csv and matplotlib visualization
    • cmd_parser.py: helper component to parse state names and search keywords from the command line
    • data_generator.py: helper component to download latest data from sources linked below: Bing for latest COVID-19 data and pytrends for Google Trends regarding search keywords.
    • data_processor.py: helper component to clean data pulled from data_generator for later use in visualization
    • data_visualizer.py: helper coment that takes a time series of pre-processed data and plots it with appropriate labels and time landmarks
    • /tests: unit tests for all components
  • /docs: all specs and presentations
    • project.png: image of component specification
    • component_specification.md: component spec document
    • functional_requirements.md: functional requirements document
    • tech_review.pdf: technology review presentation to compare various trending keyword Python packages
    • final_presentation.pdf: final class presentation of software package
  • /examples: walkthroughs of using the package

Data sources

There are two datasets used in this project:

  1. Bing COVID-19 Tracker

    • Data is available in a csv format in this Github repository

    • This data is updated daily (around 3AM PST), with a 24-hour delay

    • Data contains the following columns:

      Column header Description
      ID Unique identifier
      Updated Datetime in UTC
      Confirmed Confirmed case count for the region
      ConfirmedChange Change of confirmed case count from the previous day
      Deaths Death case count for the region
      DeathsChange Change of death count from the previous day
      Recovered Recovered count for the region
      RecoveredChange Change of recovered case counts from the previous day
      Latitude Latitude of the centroid of the region
      Longitude Longitude of the centroid of the region
      ISO2 2 letter country code identifier
      ISO3 3 letter country code identifier
      Country_Region Country/region
      AdminRegion1 Region within Country_region
      AdminRegion2 Region within AdminRegion1
  2. PyTrends Data

    • Accessed via PyTrends python package which is an "Unofficial API for Google Trends"

    • Data is aggregated at a weekly level

    • Google Trends data is reported between 0 to 100, based on the relitive proportion of a keyword to all included in the search over time

    • Interest over time query contains the following columns:

      Column header Description
      date First day of the week for which the data represents
      Trend keyword(s) Trend keyword(s) you pass into the query, e.g. "Dogs for adoption"
      isPartial Boolean indicator of whether of not the full week of data for that trend is available yet
  3. Post-Processed data

    • The aggregated, cleaned data fits the following format:

      Column header Description
      Date First day of the week for which the data represents
      Confirmed Number of COVID cases
      ConfirmedChange Change in number of COVID cases
      Deaths Number of COVID deaths
      DeathsChange Change in number of COVID deaths
      Recovered Number of COVID recoveries
      RecoveredChange Change in number of COVID recoveries
      Country Country chosen (default United States)
      State State chosen
      Trend keyword(s) Trend keyword(s) you pass into the query, e.g. "Dogs for adoption"

Team Members

  • David Wei
  • Lauren Heintz
  • Ratna Chembrolu
  • Tara Wilson
  • Zack Garcia

References