Table of Contents

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Licensing, Authors, and Acknowledgements

Installation

The code should run with no issues using Python versions 3.* and libraries as in requirements.

Project Motivation

For this project, I was interested in using Stack Overflow data from multiple years, from 2011, to better understand some insights regarding the popularity of programming languages over time. Here are the questions that the project is currently covering:

  1. Which languages were the most popular each year?
  2. Did the Android platform experience visible shifts in language of choice over the years?
  3. What trends are in top 10 languages popularity?
  4. What is the influence of previous experience on present and future choices?

File Descriptions

There will be 4 notebooks available here to showcase work related to the above questions.
Each of the notebooks will be exploratory in searching through the data pertaining to the questions showcased by the notebook title.
Markdown cells were used to assist in walking through the thought process for individual steps.
Here follows the list of Jupyter Notebooks part of the analysis (each of them will give an answer to the above listed questions):

A notebook that presents the analysis and loads all the data, named Analysis Presentation.

  1. What trends are in top 10 languages popularity?
  2. What is the influence of previous experience on present and future choices [TBD]
    1. How the number of years in programming influence the preferred/mostly used language? This could be done using scatterplot or heatmaps... Mabye also have a look at Violin/Box Plots. Faceting? Adaptation of Univariate Plots? I can use the average of the years in programming on Y axis. This is qualitative (most used language) vs quantitative (number of years in programming)
    2. Does the developer's principal language(s) influence the desire to learn a specific language in the future? This could be done usign scatterplot too? Maybe it is better to explore correlation with other features too.

Also, a set of python files where used as support for preparation (data load, transformation, etc.):

  1. data_load
  2. data_clean
  3. data_transform
  4. data_stats

Results

As soon as the analysis will be ready, the main findings of the code will be found at the post available here.

Licensing, Authors, Acknowledgements

Must give credit to Stack Overflow for the data. You can find the Licensing for the data and other descriptive information at the link available here. Otherwise, feel free to use the code here as you would like!