Introduction

This is a web scraping project, scrapping stepstones.de using python and beautiful soup. This scraps only for jobs posted newer than 24 hours or seven days, which can be chose by user.

Run scraper.py for the initiating the scrapper.

UML DIAGRAM

Activity Diagram alt_text

Sequence Diagram alt_text

Use case Diagram alt-text

Metrics

Sonarcube

Codacy

Clean Code Development

Naming Rules

  1. Descriptive and Intention revealing variable names.
  2. Pronounceable and Searchablr names.

Data Structures

  1. Data Structure exposes data and have no behaviours.

Objects

  1. Objects expose behaviour and exposes no data.

Functions

  1. Function should be small and concentrate on only one job.

  2. Functions should have few arguments. (less than 3)

  3. Function should have no side effects.

  4. Fucntions should also have decriptive names.

Tests

  1. Only one assert per test.

  2. Should be as clean as production code.

  3. should be easy to run.

Error Handling

  1. Never mix erro handling and the code.

  2. use exceptions instead of returning error codes.

  3. first code "try catch finally" -> Structured code.

  4. Always throws exceptions with constants.

Build Management

pybuilder has been used as build management tool. https://pybuilder.github.io/

Commands:

sudo apt-get install pybuilder

Inside the project directory:

pyb --start-project

for pychram Ide integration:

add use_plugin('python.pycharm') in build.py run pyb pycharm_generate

These are the basic initiation commands. build.py file is as below.

IDE

PyCharm shortcuts

  1. double shift - search
  2. alt + enter - Show intention actions and quick-fixes
  3. F2 - Navigate between code issues 4.shift + F2 - Jump to the next or previous highlighted error.
  4. ctrl + / - line comment
  5. shift + F10 - run current file
  6. shift + F9 - debug current file
  7. ctrl + f8 - toogle line breakpoint
  8. ctrl + alt + shift + f8 - toogle temporary line breakpoint
  9. ctrl + shift + f8 - View Breakpoint

Functional Programming

  1. Final Data structure:

In src/main/userfunctions/job_scraper.py

Dict is used as below code,

jobs = {} title_job = job_title.text jobs[title_job] = {}

company_job = company_name.text jobs[title_job]['company'] = company_job

posted_time_job = time_posted.get_text() jobs[title_job]['time_posted'] = posted_time_job

final_link = "https://www.stepstone.de" + job_link.attrs['href'] jobs[title_job]['job_link'] = final_link

Output of Final data dtructure alt-text

  1. side-effect free function

The concepts behind functional programming requires functions to be stateless, and rely only on their given inputs to produce an output. The functions that meet the criteria are called pure functions. The benefit of using pure functions over impure (non-pure) functions is the reduction of side effects.

In src/main/userfunctions/duration_check.py alt-text

Result: alt-text

It is an exapmle of side-effect free function.