zackkhan/JobScrapingApp_Indeed.com

Web scraper to get information about posted jobs in the US from Indeed.com

Python

Indeed.com Jobs Scraping App

Overview

Program to scrape and store posted jobs in the United States from www.indeed.com

Gets the next information from the website:

original id generated by Indeed;
job title (job_title)
posting date (job_date)
location (job_loc)
short description (job_summary)
salary (or salary range) in a list format (job_salary)
url of the job (job_url)
company name (company_name)

Getting Started

Install all required packages from requirements.txt.
$ pip install -r requirements.txt

How to use

Assign search parameters in the parameters.py:

positions should be a list of strings with all positions names or key-words for search. Even if there is one word, keep it in the list: positions = ["auditor"]

Run the app.py
$ python3 app.py

Functionality:

Scraping jobs by the key parameters: search key-words
Cleaning / formatting data.
Each scraping session saves the results as a csv data dump to the data_dumps/ folder.
Each step of the scraping is logged into the log.txt with printing the outcomes in the console.

Architecture:

app.py - enter point
main.py - the main workflow of the program
indeed_com_scraper.py - scraping functionality module
dumping.py - data cleaning / formatting module + saving data dumps
logger.py - logging functionality
parameters.py - keeping scraping parameters in separate module for easy access.

Additional:

db_scheme.py or db_scheme.sql for initial database setup.
requirements.txt required python packages.

Requirements:

python 3

Packages:

pandas 1.4.2
requests 2.28.0
beautifulsoup4 4.11.1