/ga_project4_cch

Analysing and predicting salary trends for data science jobs in Singapore

Primary LanguageJupyter Notebook

TITLE: GA DSI07 Project 4: WEB SCRAPING JOB POSTINGS by Chua Chin Hon

[Github repo title: ga_project4_cch]

SUMMARY: This is my fourth assigned project at General Assembly (Singapore), as part of a 12-week Data Science Immersive course. The project involves scrapping for jobs data on a government jobs bank, and using the data to analyse salary trends and to see if they are predictive of high pay.

I've also published a Medium post on my findings: https://medium.com/@chinhonchua/10-charts-to-guide-your-search-for-a-data-science-job-in-singapore-e4e3be9f1135

My answers for the project are in the following files in the notebooks folder:

1.0-cch-project4-Webscrape.ipynb,

1.1-cch-project4-Data_Text_Cleaning.ipynb,

1.2-cch-project4-Visualisation.ipynb,

2.0-cch-project4-Question1_Modelling.ipynb,

3.0-cch-project4-Question2_Modelling.ipynb

4.0-cch-project4-Summary_Report.ipynb

FOLDERS

2 x Folders, one each for notebooks and data

FILES

Data folder [2 files, 1 sub-folder]

chromedriver: For the webscraping exercise

jobs.csv: Original jobs dataset via webscrapping

housing.csv: Cleaned up CSV file

Notebooks folder [7 files]

1.0-cch-project4-Webscrape.ipynb: Web-scraping for jobs data

1.1-cch-project4-Data_Text_Cleaning.ipynb: Data cleaning and minor feature engineering

1.2-cch-project4-Visualisation: Visualising key job and salary trends

2.0-cch-project4-Question1_Modelling.ipynb: Answering the business questions in Qn1

3.0-cch-project4-Question2_Modelling.ipynb: Answering the business questions in Qn2

4.0-cch-project4-Summary_Report.ipynb: A summary report of the key findings

README.md: This is the original list of questions for the project.