The project's objective is to gather job-related information from Instahyre using Python's Selenium library and organize it in a specified format. The collected data will then be converted into three separate tables: jobs, company, and details, utilizing the Pandas library. To enable user-friendly searches, a search bar will be implemented using the Flask web framework, allowing users to look up skills. The search results will display essential details, such as the most common experience level, industry, and company class where the skill is in demand, along with the number of available job opportunities. To enhance user experience, the FuzzyBuzzy library will be employed to correct any input errors made by users in the search bar.
The aim of the project is to automate job data collection from Instahyre using Selenium, structure it into tables with Pandas, create a user-friendly search interface using Flask, and enhance search accuracy with FuzzyBuzzy. This will save time, provide detailed job information, improve search precision, analyze job trends, and efficiently match candidates with suitable positions.
Files/Folder | Description |
---|---|
Phase - 1 | Includes the following folders: |
Table creation: (Creating database tables) | |
Data Analysis: (Analyzing data sets) | |
Web Scraping: (Extracting data from websites). | |
Phase - 2 | Includes the following folders: |
App Logics: (Implementation of application logic.) | |
Data Preprocessing and Model Creation: (Data preparation and development of machine learning models.) | |
App: (Final application code.) |
- Jobs Table:
Column Name | Description |
---|---|
JobID | Primary key for Jobs table |
Designation | The designation of the job |
Industry | Industry of the company from which the job is |
Location | Location of the job |
Skills | Skills required for the job |
DetailID | A key to map with details table, as every job has some description |
CompanyID | A key to map with company table, as one company can have multiple jobs |
- Company Table:
Column Name | Description |
---|---|
CompanyID | Primary key for Company table |
Name | Name of the company posting the job listings |
Founded | Founded year of the company |
Employees | Total number of employees in the company |
- Details Table:
Column Name | Description |
---|---|
DetailID | Unique identifier for each set of additional details |
Skills | Skills or qualifications required for the job |
Involvement | The nature of involvement in the job |
Exp | Year of experience needed for the job |
HR | Name of HR who posted the job |
The following methodology was used to accomplish the project objectives:
-
Data Scraping: Job data was obtained from Instahyre using Python's Selenium library, considering specific criteria like job titles, locations, and company names.
-
Data Conversion: Utilizing Pandas, the scraped data underwent transformation into three tables: jobs, company, and details.
-
Data Cleaning and Preparation: The data cleaning phase involved eliminating irrelevant data, handling missing values, standardizing formats, removing duplicates, cleaning text, managing outliers, type conversion, consistency checks, categorical data normalization, and ensuring data integrity.
-
Company Classification: Companies were classified into five classes (Class0 to Class4) based on employee count and company age using K-Means clustering. The optimal number of clusters was determined using the Elbow Method.
- User-Friendly Interface: A Flask web framework introduced a search bar for users to look up skills. FuzzyBuzzy library corrected any input errors. Search results displayed the most common experience level, industry, company class related to the skill, and the number of available job opportunities.
1). Webpage with HTML/CSS:
-
Challenge: Design a webpage using HTML/CSS.
Learning: Learn HTML structure, CSS styling.
2). User Text Processing with FuzzyWuzzy:
-
Challenge: Process user text using FuzzyWuzzy.
Learning: Understand text manipulation, fuzzy matching.
3). Backend with Flask, Webpage Interaction:
-
Challenge: Create Flask backend, connect to webpage.
Learning: Grasp Flask basics, dynamic content.
4). Model Deployment Exploration:
- Challenge: Explore deployment options.
3. This webpage showcases a comprehensive list of jobs related to specific skills entered by users, along with supplementary information.
App_video.mp4
-
Python Software Foundation. (2022). Python Language Reference, version 3.10. Retrieved from https://docs.python.org/3/reference/index.html
-
Selenium with Python: https://selenium-python.readthedocs.io/
-
Wikipedia contributors. (2023, April 13). Flask (web framework). In Wikipedia, The Free Encyclopedia. Retrieved 15:48, April 22, 2023, from "https://en.wikipedia.org/wiki/Flask_(web_framework)"
-
Scikit-learn developers. (n.d.). Clustering. Retrieved April 22, 2023, from "https://scikit-learn.org/stable/modules/clustering.html"
-
FuzzyBuzzy. (n.d.). FuzzyBuzzy Documentation. Retrieved April 22, 2023, from "https://pypi.org/project/fuzzybuzzy/"