/LetsScrape

Primary LanguageJupyter NotebookMIT LicenseMIT

LetsScrape

1. Overview

This is a project by Tim Kapferer and Tim Petersen. Our goal is to compare the range of restaurants in different german cities available on Lieferando (www.lieferando.de). To gather the data we use Selenium (https://selenium-python.readthedocs.io) and matplotlib (https://matplotlib.org) to visualise the results.

2. Installation

General requirements

Installation process

Open a terminal in the project folder and enter

 $ pip install -r requirements.txt

You need to change the PATH in scraper.py in line 17 and 18 depending on your operating system.

Attention:

  • On Mac you may need to give special permissions. You will be prompted with an Apple help page for that matter if that is the case
  • On Mac and Linux you need to give the absolute path to the driver, on Windows relative and absolute path work
PATH = "PATH TO YOUR DRIVER"  
driver = webdriver.Chrome(PATH)  

3. How to use the program

Run lets_scrape.py

 $ python lets_scrape.py

and then follow the dialoge. As a result there should appear up to two pdf files in LetsScrape/Output.

Troubleshooting and Remarks

Entering adresses

  • Lieferando might not find the city. You may use an actual adress or add " Hbf" to the city name if the city has a central station or you can add the zip code to the city name.

  • For cities which might occur multiple times in germany, you may enter the zip code to the city name.
  • In very rare cases it can happen that the city is not entered correctly and Lieferando will use the last used address.
    In that case please restart the programm, Selenium is commanded to wait for it to work but sometimes it doesn't do so.

Time of gathering data

  • If you gather the data at a time where the restaurants are closed there won't be values for the delivery time,
    following from that, the intepration of those and their averages is not possible

Bug

  • If you enter the same city multiple times, then there will be multiple plots of the same type. We didn't adress this bug, because it's not intended to enter the same city multiple times.

4. Motivation, Goal and Result

Motivation

Early on we agreed to use Webscraping and Matplotlib for the project. Since we are students, we most likely make up a big part of Lieferando's (and other delivery services') customers and we noticed that the Lieferando offer in Osnabrück is quite homogeneous. Mostly pizza and burger. We agreed to scrape information on Lieferando.de to compare Osnabrück with other german cities.

Goal

Our goal is to compare differnt locations in Germany in terms of diversity, delivery costs, delivery time, ratings etc. and to visualise the results in a convincing and informative way.

Result and Progress

Kitchen Categories

As development moved on we realised that Lieferando.de has a quite liberal policy for restaurants to choose their kitchens. As an exsample a restaurant can have the kitchens "Italienisch, Italienische Pizza, Pasta" which all can be categorized as Italian. So we decided to make our own categories to reduce the number of kitchens, but of course it is still possible to apply the kitchens on Lieferando.de.

images/Heatmap.png

As you can see there might be plenty of kitchens and some types of kitchens are rare and not representive (black squares). So we decided to sort those small sub-kitchens into broader kitchens.

  • Asiatisch: Sushi, Japanisch, Poke bowl, Indisch, Thailändisch, Curry, Vietnamesisch, Chinesisch, Koreanisch, Dumplings, Indonesisch, Pakistanisch
  • Orientalisch: Türkisch, Döner, Falafel, 100% Halal, Persisch, Türkische Pizza, Arabisch, Syrisch, Libanesisch, Gyros, Griechisch, Balkanküche
  • Italienisch: Italienische Pizza, Pasta
  • Amerikanisch: Amerikanisch, Burger, Amerikanische Pizza, Hot Dog, Sandwiches, Mexikanisch, Argentinisch, Spareribs
  • Vegetarisch: Vegan
  • Cafe & Kuchen: Eiscreme, Snacks, Kuchen, Nachspeisen, Backwaren, Café, Frühstück
This is how the heatmap looks like if one applies our categorization:

images/HeatmapClear.png

Plots

We decided to use several Plots to compare cities and to show the information of interest. One can distinguish the types of plots in three categories.

One City
  • Pie Plot: Illustrates the number of kitchens in the city. We recommand you to use our custom kitchen categories, but we also included a feature that rare kitchens will be collected in 'Others'. Yet it might be more clean if one uses the custom kitchen and 'others' will be smaller.
  • Bar Plot: Illustrates the number of kitchens or the average of delivery time, delivery cost, minimum order cost or ratings. Both the custom and Lieferando's kitchen work fine, but for big cities like Munich and Berlin labels might overlap for Lieferando's kitchen.
Two Cities
  • Bar Plot for Differences: Illustrates the differences of two cities for the number of kitchens or the average of delivery time, delivery cost, minimum order cost or ratings between both cities. Here again bar plots can handle cities quite well, so both the custom and Lieferando's kitchens are fine (except again for large cities).
Multiple Cities
  • 3D Plot with multiple bars: Illustrates the number of kitchens of multiple cities. We recommend to use the custom kitchens, because for cities it will be too messy otherwise. In general we recommend the heatmap over this plot. This plot is intended as a cover sheet picture, but it can still provide a good overview.
  • Heatmap: Illustrates the number of kitchens or the average of delivery time, delivery cost, minimum order cost or ratings for multiple cities in a heatmap. It looks better with the custom kitchen tags as you can see above

For how the plots look like and how to interpret them, we recommend you the Juypter-Notebook.

5. Structure

We structured our program in five python files.

  • lets_scrape.py - User Interface and main file
  • scraper.py - Scrape the information of interest on lieferando.de
  • ui_helper.py - Helper functions for the User Interface
  • visualization.py - Compute the plots using Matplotlib
  • data_helper - Helper functions to formate and change data

For an overview on how we store the gathered data please take a look at the Juypter-Notebook.