/wikipedia-year-in-plotly

Generate Plotly graphs of the most viewed Wikipedia pages of the year

Primary LanguageJavaScript

Wikipedia year in plotly

Generates Plotly graphs for the year from the Wikimedia pageview api.

The process starts with the top page views per month across the whole year, with a bunch of processing then applied.

You can find some of the generated plots here:

A description of these graphs can be found below:

  • Overview: A mixture of the "top" articles from the other graphs listed (peaks, change, total).
  • Peaks: Articles that had the highest monthly page view values in the year.
  • Change: Articles that had the largest change between their high and low in the year.
  • Total: Articles that had the most views overall in the year.

The 2020 overview plot looks like this:

Interesting other links:

Running the code:

Install the dependencies using npm

npm install

In order to run this code you need a plotly account and to create the .plotly_user and .plotly_token files with your details.

To run for different years you currently need to alter the code.

Then just run the script with some arguments, such as "en.wikipedia" and "2020".

npm main.js <project> <year>

Note: 2016 is the first year that this will work for due to the limited data contained in the pageview API.

If you want to dump the data as it passes through the script you can do something like:

DUMP_DATA=1 node main.js en.wikipedia 2020