Real-world data rarely comes clean. Here, the main goal is to wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. Using Python and its libraries, I have gathered data from a variety of sources and in a variety of formats, assessed its quality and tidiness, and then cleaned it under the data wrangling process.
Here, with documenting my wrangling efforts, I have also showcased them through analyses and visualizations using Python (and its libraries).
The dataset that I have wrangled (and analyzed and visualized) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage.
You will need an installation of Python, plus the following libraries:
- pandas
- NumPy
- requests
- tweepy
- json
- A text editor, like VS Code or Atom.
- A terminal application (Terminal on Mac and Linux or Cygwin on Windows).
- Git for windows - for terminal application using Git Bash
- Python using Anaconda (latest version for windows)
- Visual Studio Code Editor (for windows)
The whole report can be summarized into the following 2 files which are present in this repository:
- For getting a brief of the Data Wrangling process, check
wrangle_report.html
- For visualizations and important insights, check
act_report.pdf