I analyzed Billboard Hot 100 Weekly Charts from 1958 to 2019 to show the relationship between popularity vs. relevance, velocity climb to #1 vs. #1 streak, as well as Billboard peak position vs. corresponding YouTube music video views. I show that songs from the 2010s are not outperforming the songs from the past, and confirmed that YouTube views do have correlations with higher Billboard peak positions, but noted that is already a confounding factor.
Refer to analysis.ipynb for the comprehensive analysis, code, and discussion. Please note that I am using NBViewer in order to render the interactive visualizations I built using Plotly.
This project was completed as part of the final project for DATA 512 (Human-Centered Data Science), University of Washington, Fall 2019.
This work is intended to be fully reproducible. Anyone should be able to run my code and produce the exact results as I have presented here. To try it out for yourself, please clone this repository:
git clone https://github.com/kfrankc/data512-final-project
The code repository has the following dependencies:
After installing dependencies, run command jupyter notebook
, which should bring you to a localhost environment where you can click on the analysis/
folder and click on analysis.ipynb
, which is the Jupyter notebook you can follow for both my analysis and code. Ignore the tmp_data
folder, as it contains all the temporary .csv
files I generate throughout the notebook.
Feel free to contact me at kfrankc [at] uw edu if you have any questions about this analysis.
Both the Billboards and YouTube data are shared with CC0 license. The links to the datasets websites can be found below. I have also added the two datasets in the raw_data
folder; they are named hot_100.csv
and yt_us_videos.csv
respectively.