This is a simple crawler to create a score ranking of immobilier.
The script was used as a exercise to learn a lit bit more about asyncio
lib.
It has a simple crawler which know how to parse pages from https://www.immobilienscout24.de and also creates a ranking
based on a very rudimentar score calculator (tunned for my necessities).
The script try to combine information from immobilienscout24 and alson Google Places. The score is based on simple metrics, such as living space, number of rooms, rent, etc (from immobilienscout24), supermarkets, train station and subway stations (from Google Places).
For be able to retrieve informations from Google Places, you might want to sing up to development platform for creating an API KEY.
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| Id | Score | Address | Total rent (€) | Base rent (€) | Constructed | Living Space | Rooms | Condition | Train Stations | Subway Stations | Supermarkets |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 61675789 | 4856.069 | no_information Hessen Frankfurt_am_Main | 910.000 | 840.000 | 1955 | 65.000 | 2 | modernized | 1 | 7 | 15 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111642648 | 3338.906 | Zum Brommenhof Hessen Frankfurt_am_Main | 1190.000 | 1020.000 | 2003 | 56.000 | 2 | mint_condition | 4 | 1 | 10 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111646632 | 3050.105 | no_information Hessen Frankfurt_am_Main | 830.000 | 685.000 | 1910 | 46.000 | 2 | modernized | 1 | 4 | 9 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111642937 | 2982.590 | no_information Hessen Frankfurt_am_Main | 915.000 | 790.000 | 1908 | 66.000 | 3 | modernized | 1 | 0 | 11 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111655439 | 2887.943 | Kronberger Straße Hessen Frankfurt_am_Main | 1180.000 | 940.000 | 1955 | 58.000 | 2 | fully_renovated | 1 | 5 | 7 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111555673 | 2623.746 | Pfarrer-Perabo-Platz Hessen Frankfurt_am_Main | 0.000 | 900.000 | 1995 | 60.000 | 2 | fully_renovated | 1 | 0 | 9 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111288996 | 2582.065 | Pariser Straße Hessen Frankfurt_am_Main | 1183.000 | 990.000 | 2019 | 52.170 | 2 | first_time_use | 2 | 0 | 8 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111126026 | 2518.069 | Europa-Allee Hessen Frankfurt_am_Main | 1000.000 | 750.000 | 2017 | 38.600 | 1 | mint_condition | 2 | 0 | 8 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111325118 | 2451.290 | Kölner Str. Hessen Frankfurt_am_Main | 866.000 | 666.000 | 2014 | 36.000 | 1 | fully_renovated | 2 | 1 | 7 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111477560 | 2445.416 | Mühlheimer Straße Hessen Offenbach_am_Main | 0.000 | 900.000 | 2016 | 70.460 | 2 | mint_condition | 2 | 0 | 7 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 110467290 | 2316.436 | no_information Hessen Frankfurt_am_Main | 1160.000 | 980.000 | 2018 | 57.000 | 2 | first_time_use | 1 | 3 | 5 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 110794529 | 2231.756 | Hahnstraße Hessen Frankfurt_am_Main | 865.000 | 695.000 | 2014 | 40.000 | 1 | mint_condition | 1 | 0 | 7 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 107130643 | 2190.063 | no_information Hessen Frankfurt_am_Main | 1130.000 | 925.000 | 1920 | 60.000 | 2 | well_kept | 1 | 5 | 3 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 111179952 | 2117.082 | Westring Hessen Frankfurt_am_Main | 1016.920 | 757.370 | 2018 | 72.130 | 3 | no_information | 0 | 6 | 0 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
| 110670210 | 2106.725 | Lyonerstraße Hessen Frankfurt_am_Main | 1103.000 | 873.000 | 2010 | 67.000 | 2 | mint_condition | 1 | 0 | 6 |
+-----------+----------+-------------------------------------------------------+----------------+---------------+-------------+--------------+-------+-----------------+----------------+-----------------+--------------+
Cloning the repo
mkdir ~/imscout
cd ~/imscout
git clone git@github.com:romulorosa/immobilien-crawler.git
This code was written and tested on Python >= 3.5. Make sure you have at least this version installed on your computer.
Creating virtualenv
virtualenv -p /usr/bin/python3.7 ~/venvs/imscout
source ~/venvs/imscout/bin/activate
Installing Python requirements
pip install -r requirements.txt
If you want to run this script as it is, you just need to get the URLS from your search at https://www.immobilienscout24.de and put them in a array. Here is one example:
crawler = ImmobilienScoutCrawler([
'https://www.immobilienscout24.de/expose/110448261/',
'https://www.immobilienscout24.de/expose/110794529/',
])
crawler.run()
You can also input the urls from stdin
python imscout.py https://www.immobilienscout24.de/expose/111667003 https://www.immobilienscout24.de/expose/111676616
If you are not familiar with Python environments you might want run it through Docker containers.
The first step is build the image
docker build -t immobilien-crawler .
Now you are able to run a container
docker run -it --rm immobilien-crawler python imscout.py https://www.immobilienscout24.de/expose/111667003 https://www.immobilienscout24.de/expose/109702102
You can specify your Google API key as a env var
docker run -it --env GOOGLE_MAPS_API_KEY=<your_api_key_here> --rm immobilien-crawler python imscout.py https://www.immobilienscout24.de/expose/111667003 https://www.immobilienscout24.de/expose/109702102
You can adjust the metrics used to create the rank as you wish. Everything that you need is make some adjustments on class ScoreCalculator
or event create your own one. If you want to add some new metrics in the class (not only adjust the formula already created) you will need to implement a new method on class Immobile
following the given pattern:
def _calculate_<your_metric_name>():
pass
Let's suppose that you want to add a metric called distance_from_job
. So the code should be something like this:
class ScoreCalculator:
FORMULAS = {
'space': lambda x, y, z: 50*x + 2.5*y + 1.4*z-datetime.now().year,
'price': lambda x: math.pow(1200/x, 1.2) * 100,
'places': lambda x: 100 * pow(x, 1.2),
'distance_from_job': lambda x: x * x
}
class Immobile:
GOOGLE_MAPS_API_KEY = ''
GOOGLE_PLACES_URL = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location={},{}&radius={}&type={}&key={}' # noqa
[...]
def _calculate_distance_from_job(self, formula):
pass