/dba-scraper

python3 based web scaper

Primary LanguagePython

dba-scraper

python3 based web scraper

This code is an update to the previous scraper written in Python 2.7.

There are some differences on library imports and calls. To start with, setup the python3 environment:

$ python3 -m venv env

$ source env/bin/activate

We will be using urllib3 to handle the page request. so we have to install the urllib3:

(env)$ pip install urllib3

install the Beautifulsoup package:

$ pip install beautifulsoup4

The URL we are looking at is from a danish website DBA:

https://www.dba.dk/saelger/privat/dba/5683282/?page=1

Which is the first page for this advertiser


Selenium

I have added selenium to this project in order to be able to scrape from web pages with javascript. WHat happens here is that, the webpage will load, and the next event happening is that the javascript requests the ad content to load.

Therfor, we need to make use of a slightly different method than only beautifulsoup.