This is a project by Tim Kapferer and Tim Petersen. Our goal is to compare the range of restaurants in different german cities available on Lieferando (www.lieferando.de). To gather the data we use Selenium (https://selenium-python.readthedocs.io) and matplotlib (https://matplotlib.org) to visualise the results.
- Working python enviroment with version >= python 3.7 (https://www.python.org)
- Or working Conda environment with python version >= 3.7 (Install Conda)
- Chromedriver (https://chromedriver.chromium.org/downloads)
Install the chromdriver which fits your brower version. You can see the chrome version under Settings>About Chrome.
Open a terminal in the project folder and enter
$ pip install -r requirements.txt
You need to change the PATH in scraper.py
in line 17 and 18 depending on your operating system.
Attention:
- On Mac you may need to give special permissions. You will be prompted with an Apple help page for that matter if that is the case
- On Mac and Linux you need to give the absolute path to the driver, on Windows relative and absolute path work
PATH = "PATH TO YOUR DRIVER"
driver = webdriver.Chrome(PATH)
Run lets_scrape.py
$ python lets_scrape.py
and then follow the dialoge. As a result there should appear up to two pdf files in LetsScrape/Output.
- Lieferando might not find the city. You may use an actual adress or add " Hbf" to the city name if the city has a central station or you can add the zip code to the city name.
- For cities which might occur multiple times in germany, you may enter the zip code to the city name.
- In very rare cases it can happen that the city is not entered correctly and Lieferando will use the last used address.
In that case please restart the programm, Selenium is commanded to wait for it to work but sometimes it doesn't do so.
- If you gather the data at a time where the restaurants are closed there won't be values for the delivery time,
following from that, the intepration of those and their averages is not possible
- If you enter the same city multiple times, then there will be multiple plots of the same type. We didn't adress this bug, because it's not intended to enter the same city multiple times.
Early on we agreed to use Webscraping and Matplotlib for the project. Since we are students, we most likely make up a big part of Lieferando's (and other delivery services') customers and we noticed that the Lieferando offer in Osnabrück is quite homogeneous. Mostly pizza and burger. We agreed to scrape information on Lieferando.de to compare Osnabrück with other german cities.
Our goal is to compare differnt locations in Germany in terms of diversity, delivery costs, delivery time, ratings etc. and to visualise the results in a convincing and informative way.
As development moved on we realised that Lieferando.de has a quite liberal policy for restaurants to choose their kitchens. As an exsample a restaurant can have the kitchens "Italienisch, Italienische Pizza, Pasta" which all can be categorized as Italian. So we decided to make our own categories to reduce the number of kitchens, but of course it is still possible to apply the kitchens on Lieferando.de.
As you can see there might be plenty of kitchens and some types of kitchens are rare and not representive (black squares). So we decided to sort those small sub-kitchens into broader kitchens.
- Asiatisch: Sushi, Japanisch, Poke bowl, Indisch, Thailändisch, Curry, Vietnamesisch, Chinesisch, Koreanisch, Dumplings, Indonesisch, Pakistanisch
- Orientalisch: Türkisch, Döner, Falafel, 100% Halal, Persisch, Türkische Pizza, Arabisch, Syrisch, Libanesisch, Gyros, Griechisch, Balkanküche
- Italienisch: Italienische Pizza, Pasta
- Amerikanisch: Amerikanisch, Burger, Amerikanische Pizza, Hot Dog, Sandwiches, Mexikanisch, Argentinisch, Spareribs
- Vegetarisch: Vegan
- Cafe & Kuchen: Eiscreme, Snacks, Kuchen, Nachspeisen, Backwaren, Café, Frühstück
We decided to use several Plots to compare cities and to show the information of interest. One can distinguish the types of plots in three categories.
- Pie Plot: Illustrates the number of kitchens in the city. We recommand you to use our custom kitchen categories, but we also included a feature that rare kitchens will be collected in 'Others'. Yet it might be more clean if one uses the custom kitchen and 'others' will be smaller.
- Bar Plot: Illustrates the number of kitchens or the average of delivery time, delivery cost, minimum order cost or ratings. Both the custom and Lieferando's kitchen work fine, but for big cities like Munich and Berlin labels might overlap for Lieferando's kitchen.
- Bar Plot for Differences: Illustrates the differences of two cities for the number of kitchens or the average of delivery time, delivery cost, minimum order cost or ratings between both cities. Here again bar plots can handle cities quite well, so both the custom and Lieferando's kitchens are fine (except again for large cities).
- 3D Plot with multiple bars: Illustrates the number of kitchens of multiple cities. We recommend to use the custom kitchens, because for cities it will be too messy otherwise. In general we recommend the heatmap over this plot. This plot is intended as a cover sheet picture, but it can still provide a good overview.
- Heatmap: Illustrates the number of kitchens or the average of delivery time, delivery cost, minimum order cost or ratings for multiple cities in a heatmap. It looks better with the custom kitchen tags as you can see above
For how the plots look like and how to interpret them, we recommend you the Juypter-Notebook
.
We structured our program in five python files.
- lets_scrape.py - User Interface and main file
- scraper.py - Scrape the information of interest on lieferando.de
- ui_helper.py - Helper functions for the User Interface
- visualization.py - Compute the plots using Matplotlib
- data_helper - Helper functions to formate and change data
For an overview on how we store the gathered data please take a look at the Juypter-Notebook
.