Creator: Ryan Bae
Date: October 18th, 2018
The goal of this project is to perform an exercise in data curation and analysis reproducibility by obtaining data from Wikimedia's API to download monthly Wikipedia page views data and generate a plot. The idea is that any user can look at this repo and reproduce the results from this exercise without difficulty.
Final en-wikipedia_traffic_200712-201809.csv
file has the following schema:
- year (int)
- month (int)
- pagecount_all_views (int)
- pagecount_desktop_views (int)
- pagecount_mobile_views (int)
- pageview_all_views (int)
- pageview_desktop_views (int)
- pageview_mobile_views (int)
There are few considerations to take into account when reproducing the results. There are some differenes between the two API's used to obtain the data.
-
Legacy Pageview API Data
- Includes web spiders/crawlers
-
Current Pagecounts API Data
- Does not include web spiders/crawlers
- Further divides mobile data into mobile-app and mobile-web data
Also look at the following resources for API documentation:
Legacy Pagecounts API Documentation
Legacy Pagecounts API Endpoint
When reproducing the results, please take a look at the following link terms of use information:
The content accessed via Wikimedia's API is licensed under the CC-BY-SA 3.0 and GFDL licenses, and you irrevocably agree to release modifications or additions made through this API under these licenses.