StackOverflow 2019 survey data analysis

Description

This Project is done in the context of Data Science Nanodegree Program by Udacity.

Key Steps of the project in finding the solutions are:

Picking a dataset.
Posing at least three questions related to business or real-world applications of how the data could be used.
Performing necessary cleaning, analysis, and modeling.
- Data preparation:
  - Gather necessary data to answer the questions
  - Handle categorical and missing data
  - Provide insight into the chosen methods and why they were chosen
- Data Analyzis, Modeling, and Visualization to provide a clear connection between the business questions and how the data answers them.
Sharing the business insights with stakeholders.

The project is following the CRISP-DM (Cross Industry Standard Process for Data Mining) process or methodology which consists of the following steps

The Stackoverflow Developer survey data from 2019 is used to answer the following questions regarding Open Source Software (OSS) contributions:

The analysis notebook is available here

Create a Python 3.6 conda virtual environment

conda create --name py36 python=3.6
Activate the new environment

conda activate py36
Install required packages by running the following command in the app's directory pip install -r requirements.txt
Extract data folder

unzip data/so_survey_2019/so_developer_survey_2019.zip -d data/so_survey_2019/
run jupyter lab

If you want to just display the notebook content and its outputs use nbviewer. Also an html format of the notebook can be viewed here.