Prediction-of-Developer-s-Gross-Income

Objective

Building a web portal which predicts salary and creating data analysis. Extract data from https://insights.stackoverflow.com/survey

Everyday Progress :

  • 26 Aug 2020
    • ✔ 1. Issues Created and milestone added
  • 27 Aug 2020
    • ✔ 2. Data Analysis Done
    • ✔ 3. Converted categorical parameters into int/float
    • ✔ 4. Identified problematic features, which has to be dealt distinctly
  • 28 Aug 2020
    • ✔ 5. Figuring out combinational features which needs to be treated as multi-class categories
    • ✔ 6. Discussed inputs format for the API
    • ✔ 7. Discussing different methods for creating uniform data
  • 29 Aug 2020
    • ✔ 8. Data Cleaning
    • ✔ 9. Treated YearCode data and removed some more features
    • ✔ 10. Compensation data cleaned
    • ✔ 11. Method of treating multi-class data decided
  • 30 Aug 2020
    • ✔ 12. Treated multi-class categorical data
    • ✔ 13. Impute data with mean of attribute
    • ✔ 14. Normalized data
    • ✔ 15. Checked Correlation
    • ✔ 16. Improvement needed
  • 31 Aug 2020
    • ✔ 17. Working on correlation
  • 1 Sept 2020
    • ✔ 18. Landing page added
    • ✔ 19. Predict Page added
    • ✔ 20. Columns analysis
    • ✔ 21. Analysed each country's data and corresponding correlation
  • 2 Sept 2020
    • ✔ 22. Exploratory Data Analysis
    • ✔ 23. Grouping data according to Country
  • 23 Sept 2020
    • ✔ 24. Model Created
    • ✔ 25. Flask app in work

Creating Pipeline

TO DO :

  • Web Dev Part

    • Basic (Templates, and Technology to be used)

      1. Front-end ( HTML )
      2. Front-end ( CSS )
      3. Front-end ( JS )
      4. Flask for model deployment and Back-end( Flask )
      5. Hosting the web app.
    • Page Structure (3 pages to be designed)

      1. Landing Page (Description about the application)
      2. Forms Page (Inputting the data)
      3. Result and data analysis (Predicted results, and real time analysis of data)
  • Data Science Part

    • Basic (Model, and Technology/Algorithms to be used)
      1. Python (IDE to be used Spyder/Pycharm )
      2. Regression
      3. Decision Tree Regressor
      4. Random Forest Regressor
      5. ANN Regression

Challenges anticipated

  • Web Dev Part
    1. Deploying the model using Flask
    2. Creating real time data analysis
    3. Using API
  • Data Science Part
    1. Imputing the data according to what data we need (for eg. ,If student doesn't have compensation it's assigned NA, but is a important data)
    2. Eliminating the non-useful parameters, (manual or using correlation)
    3. Creating Categories
    4. Handling huge amount of data on Local Machine
  • Model Deployment
  • To be added