/We-Rate-Dogs-Data-Wrangling

Report emphasizing on wrangling efforts for WeRateDogs tweet archive data

Primary LanguageJupyter Notebook

We Rate Dogs - Data Wrangling

Introduction

Real-world data rarely comes clean. Here, the main goal is to wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. Using Python and its libraries, I have gathered data from a variety of sources and in a variety of formats, assessed its quality and tidiness, and then cleaned it under the data wrangling process.

Here, with documenting my wrangling efforts, I have also showcased them through analyses and visualizations using Python (and its libraries).

The dataset that I have wrangled (and analyzed and visualized) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage.

Softwares needed:

You will need an installation of Python, plus the following libraries:

  1. pandas
  2. NumPy
  3. requests
  4. tweepy
  5. json
  • A text editor, like VS Code or Atom.
  • A terminal application (Terminal on Mac and Linux or Cygwin on Windows).

Installation links for softwares:

Summary:

The whole report can be summarized into the following 2 files which are present in this repository:

  • For getting a brief of the Data Wrangling process, check wrangle_report.html
  • For visualizations and important insights, check act_report.pdf

References

  1. Reading and writing json to a file

  2. Unique Rating System of WeRateDogs

  3. Tidy Data Rules