DAND Project 2: Wrangle and Analyze Data

Real-world data rarely comes clean. Using Python and its libraries, this project aims to gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it.

The dataset that we will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10.

The project consists of the following files:

wrangle_act.ipynb: code for gathering, assessing, cleaning, analyzing, and visualizing data.
wrangle_report.pdf: documentation for data wrangling steps: gather, assess, and clean.
act_report.pdf: documentation of analysis and insights into final data.
twitter_archive_enhanced.csv: file containing the original data before wrangling, as given.
image_predictions.tsv: file downloaded programmatically.
tweet_json.txt: file constructed via Twitter API.
twitter_archive_master.csv: combined and cleaned Tweets information data.
image_predictions_clean.csv: combined and cleaned image prediction data.

SuHamza/DAND-P2-Data-Wrangling

DAND Project 2: Wrangle and Analyze Data