/DAND-P2-Data-Wrangling

DAND Project 2: Wrangle and Analyze Data

Primary LanguageJupyter Notebook

DAND Project 2: Wrangle and Analyze Data

Real-world data rarely comes clean. Using Python and its libraries, this project aims to gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it.

The dataset that we will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10.

The project consists of the following files:

  • wrangle_act.ipynb: code for gathering, assessing, cleaning, analyzing, and visualizing data.
  • wrangle_report.pdf: documentation for data wrangling steps: gather, assess, and clean.
  • act_report.pdf: documentation of analysis and insights into final data.
  • twitter_archive_enhanced.csv: file containing the original data before wrangling, as given.
  • image_predictions.tsv: file downloaded programmatically.
  • tweet_json.txt: file constructed via Twitter API.
  • twitter_archive_master.csv: combined and cleaned Tweets information data.
  • image_predictions_clean.csv: combined and cleaned image prediction data.