/r-python

Exploring data related to relative usage of R vs. python

Primary LanguageR

R and python usages

This repo is an attempt to use data to explore the claims in Python Displacing R As The Programming Language For Data Science and The homogenization of scientific computing, or why Python is steadily eating other languages’ lunch.

The individual files contain the R code that I used to gather data from each source, and the results are summarised below. I've made no attempt to separate python for data analysis from other uses of python, but hopefully the signals are still indicative. If you think my methodology is wrong, or you have other ideas for data sets, please send a pull request and I'll merge it in.

Stackoverflow questions

Using the stackexchange data explorer, I calculated the number of questions asked by month for both python and R. Overall, both R and python questions are growing explosively over time:

Explosive growth of R and python questions over time

A little further exploration (not shown) indicates that this is very close to being exponential growth.

If we standardise the number of R questions by the number of python questions, we see that the number of R questions is increasing more rapidly than python. Currently, about 1 question about R is asked for every three questions asked about python.

R questions growing relative to python

Github repos

Again we see exponential growth in both repos containing R code and repos containing python code (these number don't include forks), but R repo's are relatively less common than R questions. The big jump in repo creation in 2014 is probably due the JHU coursera course.

Explosive growth of R and python repos over time

If we standardise the number of R repos by the number of python repos, we see that R has been decreasing since the big jump in 2015.

R repos growing relative to python repos

Google trends

Looking at google trends data for people searching for language tutorials, both languages are relatively flat. Growth in searches for R tutorials is relatively flat, perhaps with a slight increases, while growth for python searches has been considerably more variable over time.

Some Python Data (but not much)

This is the data of monthly downloads made available from the Python PyPi Package Index. The plot shows the growth in several data analysis packages for Python. Somethig happens in March, 2013 when the growth explodes.

PyPi package downloads

Other ideas

  • Look at use of mailing lists. Is there a pydata specific mailing list?
  • Compare twitter hashtags: rstats, python, pydata?
  • Compare package downloads?
  • Number of Kaggle solution scripts written in R versus Python.
  • Number of Machine Learning courses on MOOC sites that use R versus Python.
  • Compare attendees at big R versus big Python data conferences year-over-year.