Building a web portal which predicts salary and creating data analysis. Extract data from https://insights.stackoverflow.com/survey
- 26 Aug 2020
- ✔ 1. Issues Created and milestone added
- 27 Aug 2020
- ✔ 2. Data Analysis Done
- ✔ 3. Converted categorical parameters into int/float
- ✔ 4. Identified problematic features, which has to be dealt distinctly
- 28 Aug 2020
- ✔ 5. Figuring out combinational features which needs to be treated as multi-class categories
- ✔ 6. Discussed inputs format for the API
- ✔ 7. Discussing different methods for creating uniform data
- 29 Aug 2020
- ✔ 8. Data Cleaning
- ✔ 9. Treated YearCode data and removed some more features
- ✔ 10. Compensation data cleaned
- ✔ 11. Method of treating multi-class data decided
- 30 Aug 2020
- ✔ 12. Treated multi-class categorical data
- ✔ 13. Impute data with mean of attribute
- ✔ 14. Normalized data
- ✔ 15. Checked Correlation
- ✔ 16. Improvement needed
- 31 Aug 2020
- ✔ 17. Working on correlation
- 1 Sept 2020
- ✔ 18. Landing page added
- ✔ 19. Predict Page added
- ✔ 20. Columns analysis
- ✔ 21. Analysed each country's data and corresponding correlation
- 2 Sept 2020
- ✔ 22. Exploratory Data Analysis
- ✔ 23. Grouping data according to Country
- 23 Sept 2020
- ✔ 24. Model Created
- ✔ 25. Flask app in work
TO DO :
-
Web Dev Part
-
Basic (Templates, and Technology to be used)
- Front-end (
HTML
) - Front-end (
CSS
) - Front-end (
JS
) - Flask for model deployment and Back-end(
Flask
) - Hosting the web app.
- Front-end (
-
Page Structure (3 pages to be designed)
- Landing Page (Description about the application)
- Forms Page (Inputting the data)
- Result and data analysis (Predicted results, and real time analysis of data)
-
-
Data Science Part
- Basic (Model, and Technology/Algorithms to be used)
- Python (
IDE to be used Spyder/Pycharm
) - Regression
- Decision Tree Regressor
- Random Forest Regressor
- ANN Regression
- Python (
- Basic (Model, and Technology/Algorithms to be used)
- Web Dev Part
- Deploying the model using Flask
- Creating real time data analysis
- Using API
- Data Science Part
- Imputing the data according to what data we need (for eg. ,If student doesn't have compensation it's assigned NA, but is a important data)
- Eliminating the non-useful parameters, (manual or using correlation)
- Creating Categories
- Handling huge amount of data on Local Machine
- Model Deployment
- To be added