/JobScrapingApp_Indeed.com

Web scraper to get information about posted jobs in the US from Indeed.com

Primary LanguagePython

Indeed.com Jobs Scraping App

Overview

Program to scrape and store posted jobs in the United States from www.indeed.com

Gets the next information from the website:

  • original id generated by Indeed;
  • job title (job_title)
  • posting date (job_date)
  • location (job_loc)
  • short description (job_summary)
  • salary (or salary range) in a list format (job_salary)
  • url of the job (job_url)
  • company name (company_name)

Getting Started

  1. Install all required packages from requirements.txt.
    $ pip install -r requirements.txt

How to use

  1. Assign search parameters in the parameters.py:
  • positions should be a list of strings with all positions names or key-words for search. Even if there is one word, keep it in the list: positions = ["auditor"]
  1. Run the app.py
    $ python3 app.py

Functionality:

  1. Scraping jobs by the key parameters: search key-words
  2. Cleaning / formatting data.
  3. Each scraping session saves the results as a csv data dump to the data_dumps/ folder.
  4. Each step of the scraping is logged into the log.txt with printing the outcomes in the console.

Architecture:

  1. app.py - enter point
  2. main.py - the main workflow of the program
  3. indeed_com_scraper.py - scraping functionality module
  4. dumping.py - data cleaning / formatting module + saving data dumps
  5. logger.py - logging functionality
  6. parameters.py - keeping scraping parameters in separate module for easy access.

Additional:

  1. db_scheme.py or db_scheme.sql for initial database setup.
  2. requirements.txt required python packages.

Requirements:

python 3

Packages:

  • pandas 1.4.2
  • requests 2.28.0
  • beautifulsoup4 4.11.1