Team Leads (Contacts) : [Samuel Lawrence]: http://samuel-lawrence.co.uk/
Webscraper adapted from https://towardsdatascience.com/selenium-tutorial-scraping-glassdoor-com-in-10-minutes-3d0915c6d905
Inspiration for the project was based on Ken Jee's youtube series 'data science project from scratch' Major changes include:
- Unique model building approach based on sklearn ensemble module
- The model was deployed to production via streamlit on heroku url: https://glassdoor-fin-analyst.herokuapp.com/
- Updated webcrawler was in need of overall due to glassdoor's updated website
- Unique field objective
- Adapt web scraper for data for model
- Clean data for analysis
- Analyze data
- Submit findings
- Scale and Build Machine Learning Model
- Host product on heroku
The objective of this project is to further understand what it takes to be a financial analyst in London. This exercise will serve as a gateway to those seeking to become analyst themselves as well as create an entry point adapting a machine learning model in predicting what role may be expected in relation to the different variables.
- Inferential Statistics
- Machine Learning
- Data Visualization
- Predictive Modeling
- Python
- Pandas
- Numpy
- Matplotlib
- Nltk
- Wordcloud
- Seaborn
- Sklean
- Selenium
- Sklearn
As we move closer to the full cycle of graduates moving into the work force, the question has been posed is what does it take/what is it like to be a financial analyst? Some questions we plan on answering include:
-
What kind of salary should be expected?
-
What positions are the most popular?
-
Types of companies Hiring?
-
What industries are the most popular?
-
Similarities between different roles?
-
Other questions we might want answered as we explore the data some more?
- The data was gathered from Glassdoor job postings on 6/7/2020 via web scraper with the use of the Selenium Python library. As such, COVID-19 has remained a constant factor in our lives and should be taken into consideration.
- -1 represents data that wasn't specified in the job posting
- The sample size for this data set was 1,000 entries.
- We ran the web scraper multiple times to get a wider pool of data due to the number of missing data
- Some of the most common words mentioned in the analysis include: 'Problem Solving','Bachelor Degree','team' and 'attention to detail'
- Average salary came out to around 30K depending on the seniority level
- Most big corporations are doing the hiring at the moment
- With more data and better feature selection, users could calculate their exact salary