YouTube data analysis

author: Hantang Li

YouTube, the world’s third most popular online destination, has transformed from a video-sharing site into a job opportunity for content creators in both new and mainstream media. (cite: https://www.elon.edu/u/academics/communications/journal/wp-content/uploads/sites/153/2017/06/06_Margaret_Holland.pdf) Individuals who upload videos on Youtube, also known as YouTubers, could turn on monetization features. One of the major ways YouTubers earn money is through the number of ad views (https://support.google.com/youtube/answer/72857?hl=en). Since ad views depend on each video’s views, we would like to analyze what factors could result in a high view and how people’s preferences have changed in recent years.

We will use past Canadian area's youtube daily trending video datasets found online to answer this question. The largest past youtube trending video data set we found online is from Kaggle. The dataset contains detailed daily trending video information collected using YouTube Data API v3 ranging from 2017-12-01 to 2018-05-31. The download link is https://www.kaggle.com/rsrishav/youtube-trending-video-dataset.

For comparison, we found another Canadian area's youtube daily trending video dataset from Kaggle with a similar data format which was collected using YouTube Data API v3. The data ranges from 2020-08-12 to 2022-03-07. The download link is https://www.kaggle.com/rsrishav/youtube-trending-video-dataset.

Link to this repository: https://github.com/Hantang-Li/Youtube-data-analysis

Link to the website: https://hantang-li.github.io/Youtube-data-analysis/

Link to the report: https://github.com/Hantang-Li/Youtube-data-analysis/blob/main/final_report.pdf

Project components

This project includes two reproducible reports and one script for data preprocessing.

    1. The website: The website's knitted HTML files are included in the docs folder. You can reproduce it by following the procedures in the data folder's readme file. The website provides a link to the final report and a presentation that introduces the website.
    1. The final report: The knitted PDF report is final_report.pdf. You can reproduce it by following the procedures in the data folder's readme file. The final report provides detailed preprocessing procedures and analysis of the youtube trending data.
    1. (Extra) The preprocessing script, records all the preprocessing code that can be used to produce preprocessed data set for producing the final report and website.

Data set

You can find preprocessed dataset df_CA_trending.Rda from the data folder, with instruction on how to use it.