The goal of this project was to carry out carry out data wrangling of the WeRatedDogs twitter data. For this data was sourced for three different places and combined into a single file called twitter_archive_master.csv. The data had many quality and tidness issues such as variables with wrong datatypes, missing and repeated values. Some of these issues were addressed and data visualization was carried out to summarize some of the key insights.
There are seven files in the repositry:
• act_report.pdf - this file contains a concise summary of the important observations.
• image-predictions.tsv - dog breed prediction using neural networks (file provided by Udacity)
• tweet_json.txt - this text file contains all the downloaded tweets
• twitter-archive-enhanced.csv - uncleaned/messy data contains information such as 'tweet id', timestamp etc
• twitter_archive_master.csv - this master file contains the final cleaned data
• wrangle_act.html - the python code which performs the data wrangling process
• wrangle_report.html - documentation of the data wrangling efforts
• Most tweets had a rating between 10/10 and 12/10 infact around 70% of the tweets had this ratings
• Golden retriever is the most tweeted dog breed followed by Labrador Retriever and Pembroke
• Standard Poodle on average had the most retweets followed by English springer and Afghan hound (both of which had similar retweet counts)
• Even though Standard Poodle on average had more retweets it was the Saluki breed which had the highest favorite counts (more than 20k)
• The most highly rated dog was Atticus who had the highest numerator rating of 1776. With a bow tie and sunglasses he is truly a good boi