The code should run with no issues using Python versions 3.* and libraries as in requirements.
For this project, I was interested in using Stack Overflow data from multiple years, from 2011, to better understand some insights regarding the popularity of programming languages over time. Here are the questions that the project is currently covering:
- Which languages were the most popular each year?
- Did the Android platform experience visible shifts in language of choice over the years?
- What trends are in top 10 languages popularity?
- What is the influence of previous experience on present and future choices?
There will be 4 notebooks available here to showcase work related to the above questions.
Each of the notebooks will be exploratory in searching through the data pertaining to the questions showcased by the
notebook title.
Markdown cells were used to assist in walking through the thought process for individual steps.
Here follows the list of Jupyter Notebooks part of the analysis (each of them will give an answer to the above
listed questions):
A notebook that presents the analysis and loads all the data, named Analysis Presentation.
- What trends are in top 10 languages popularity?
- What is the influence of previous experience on present and future choices [TBD]
- How the number of years in programming influence the preferred/mostly used language? This could be done using scatterplot or heatmaps... Mabye also have a look at Violin/Box Plots. Faceting? Adaptation of Univariate Plots? I can use the average of the years in programming on Y axis. This is qualitative (most used language) vs quantitative (number of years in programming)
- Does the developer's principal language(s) influence the desire to learn a specific language in the future? This could be done usign scatterplot too? Maybe it is better to explore correlation with other features too.
Also, a set of python files where used as support for preparation (data load, transformation, etc.):
As soon as the analysis will be ready, the main findings of the code will be found at the post available here.
Must give credit to Stack Overflow for the data. You can find the Licensing for the data and other descriptive information at the link available here. Otherwise, feel free to use the code here as you would like!