/bike-sharing

Primary LanguageJupyter NotebookMIT LicenseMIT

bike-sharing

Data Collection

src/data-preprocessing

Overview

  • create a database
  • setup the script on a server
  • run script automated with a cron job

Prerequisites

  • Python 3.6
  • Libraries:
    • requests
    • psycopg2

install packages:

cd src/data-preprocessing
pip install -r requirements.txt

Scripts

SQL Script create_bikeDB.sql to create the database scheme

Create a database where the data queried in the script is being stored.

Script query_bike_apis.py is used to query provider API data

API requests to receive all current locations of bikes from nextbike, lidlbike and mobike in Berlin (inner circle) and store them into a single database.

Script query_nextbike_stations.py is used to query the stations of nextbike

Config File Add config.py file to src/data-preprocessing with API Keys for Deutsche Bahn API (https://developer.deutschebahn.com/store/) and database credentials. (see Example config-example.py)

Run script automized

Set up a cron job that runs the script in regular intervalls. E.g. this setup

  • runs the query_bike_apis.py script every 4 minutes
  • runs the query_nextbike_stations.py script once a day at 8 AM
  • runs a cleaning script on the database (/src/clean_script.py) once a day at 11 PM deleting all unnecessary rows in the database.

CRON JOBS

    */4 * * * * python3 [PATH TO FOLDER]/src/query_bike_apis.py
    0 8 * * * python3 [PATH TO FOLDER]/src/query_nextbike_stations.py
    0 23 * * * python3 [PATH TO FOLDER]/src/clean_script.py

Query other cities or providers

To query APIs for different cities the src/data-processing/query_bike_apis.py script has to be adapted accordingly. To query other providers this documentation is a good source of information.

For access to lime bike API insert phone_no to config.py and follow steps in lime_access.py (three manual steps required).

Data Analysis

src/analysis

Jupyter Notebook to analyse data.

  • preprocess.ipynb contains the preprossing steps of the raw data to a usable format.

    • raw.csv contains the data from the database
    • preprocessed.csv contains the data with added columns and fixed lat / lng
    • routed.csv contains the data with distance and waypoints
    • cleaned.csv is the cleaned routed dataset (unplausible data is removed)
    • pseudonomysed.csv is the anonymized, cleaned data, following this standard
    • pseudonomysed_raw.csv ist the anonymized data (NOT cleaned).
  • analysis.ipynb includes analysis about provider and bike specific data

  • pseudonomysed.ipynb includes analysis using the anonymized dataset (without information on providers.)