This is a web scraping project, scrapping stepstones.de using python and beautiful soup. This scraps only for jobs posted newer than 24 hours or seven days, which can be chose by user.
Run scraper.py for the initiating the scrapper.
Sonarcube
Codacy
Naming Rules
- Descriptive and Intention revealing variable names.
- Pronounceable and Searchablr names.
Data Structures
- Data Structure exposes data and have no behaviours.
Objects
- Objects expose behaviour and exposes no data.
Functions
-
Function should be small and concentrate on only one job.
-
Functions should have few arguments. (less than 3)
-
Function should have no side effects.
-
Fucntions should also have decriptive names.
Tests
-
Only one assert per test.
-
Should be as clean as production code.
-
should be easy to run.
Error Handling
-
Never mix erro handling and the code.
-
use exceptions instead of returning error codes.
-
first code "try catch finally" -> Structured code.
-
Always throws exceptions with constants.
pybuilder has been used as build management tool. https://pybuilder.github.io/
Commands:
sudo apt-get install pybuilder
Inside the project directory:
pyb --start-project
for pychram Ide integration:
add use_plugin('python.pycharm')
in build.py
run pyb pycharm_generate
These are the basic initiation commands. build.py file is as below.
PyCharm shortcuts
- double
shift
- search alt + enter
- Show intention actions and quick-fixesF2
- Navigate between code issues 4.shift + F2
- Jump to the next or previous highlighted error.ctrl + /
- line commentshift + F10
- run current fileshift + F9
- debug current filectrl + f8
- toogle line breakpointctrl + alt + shift + f8
- toogle temporary line breakpointctrl + shift + f8
- View Breakpoint
- Final Data structure:
In src/main/userfunctions/job_scraper.py
Dict is used as below code,
jobs = {}
title_job = job_title.text
jobs[title_job] = {}
company_job = company_name.text
jobs[title_job]['company'] = company_job
posted_time_job = time_posted.get_text()
jobs[title_job]['time_posted'] = posted_time_job
final_link = "https://www.stepstone.de" + job_link.attrs['href']
jobs[title_job]['job_link'] = final_link
Output of Final data dtructure
- side-effect free function
The concepts behind functional programming requires functions to be stateless, and rely only on their given inputs to produce an output. The functions that meet the criteria are called pure functions. The benefit of using pure functions over impure (non-pure) functions is the reduction of side effects.
In src/main/userfunctions/duration_check.py
It is an exapmle of side-effect free function.