The following instructions describe how to generate the data required for running the experiments conducted during this project.
All experiments were run on the Linux operating system, using Ubuntu 16.04 up to Ubuntu 19.10. Experiments require at least Python 3, preferably Python 3.7 or higher.
Our experiments rely on data files of Gigabytes in size. Since Canvas can not handle files of these sizes, we are forced to store our source data on external services.
Our data generating file, data_generator.py
, relies on Python 3.7.5. It can be run as follows: python3 data_generator.py
.
Besides creating the necessary data, this file will also look for any missing modules on the user's system and list these. These modules must be installed manually using pip3 install
.
Experiments can be run by navigating to the experiments
folder and running Python scripts from here. The results are both stored as .png
images in the results
folder, as well as shown as an interactive figure.
amount_stops_vs_delays_per_day.py
: Scatter plot with amount of stops made per day versus amount of delayed stops made per daycanceled_per_delay.py
: Scatter plot of the number of delayed stops versus the number of canceled trainscorrelation_trackdelays.py
: Time series of the total delay on station ASD(=Amsterdam Centraal)correlation_trackdelays.py
: Experiment descriptiondelay_1pm_histogram.py
: Histogram showing amount of delayed stops per weekday at 11:00 PMdelay_11pm_histogram.py
: Histogram showing amount of delayed stops per weekday at 1:00 PMdelay_5pm_histogram.py
: Histogram showing amount of delayed stops per weekday at 5:00 PMdelay_8am_histogram.py
: Histogram showing amount of delayed stops per weekday at 8:00 AMdelay_histogram2.py
: Histogram showing amount of delayed stops per weekdaydelay_histogram_log.py
: Experiment where log values are taken to prove that the overal delay (in minutes) follows a lognormal distribution (incl. ks test).delay_time_distribution.py
: Fit the exponential distribution over the frequency of x minutes delaydelayed_per_total.py
: Show amount of delayed stops versus the total number of stops for every company in the data set.delayed_train_station_percentage.py
: Experiment descriptionhourly.py
: Plot showing amount of (delayed) trains per hour of the daynormalize_wind_delay.py
: Plot to find the relation between the wind speed in the Netherlands and the number of delayed trains, both on a single day and normalized.normalized_rain_delay.py
: Plot to find the relation between the rain in the Netherlands and the number of delayed trains, both on a single day and normalized.overtime.py
: Plot showing amount of stops and delays per date (total)overtime_montues_diffs.py
: Histogram showing difference of stops and delays per Monday and Tuesdayovertime_weekday.py
: Experiment on amount of delayed stops per weekday, filtering out weekendsovertime_weekday_filtered.py
: Experiment on amount of delayed stops per weekday, with a ks test for normal distribution.overtime_weekend.py
: Plot showing amount of stops and delays per date (total), filtering out weekdays for smoother linesovertime_weekend_diffs.py
: Histogram showing difference of stops and delays per date (total) and attempting to prove that the difference follows a normal distribution (incl. ks test).station.py
: Plot showing amount of stops and delays per stationstation_relative.py
: Histogram showing amount of delays divided by stops per stationstation_relative_5.py
: Histogram showing amount of delays of more than 5 minutes divided by stops per stationtrain_type_delays.py
: Show a bar plot with the amount of delays for every carrier type of the NSweather_delay_rain.py
: Plot to find the relation between the rain in the Netherlands and the minutes of delay of all trains, both on a single day.weather_delay_wind.py
: Plot to find the relation between the windspeed in the Netherlands and the minutes of delay of all trains, both on a single day.weather_delayornot_rain.py
: Plot to find the relation between the rain in the Netherlands and the number of delayed trains, both on a single day.weather_delayornot_rain_percentage.py
: Plot to find the relation between the rain in the Netherlands and the number of delayed trains (percentage), both on a single day.weather_delayornot_wind.py
: Plot to find the relation between the rain in the Netherlands and the number of delayed trains, both on a single day.weather_delayornot_wind_percentage.py
: Plot to find the relation between the rain in the Netherlands and the number of delayed trains (percentage), both on a single day.weekday.py
: Plot showing amount of stops and delays per weekday (total)weekdays.py
: Plotting delay in minutes on weekdays (excluding cancelled trains)trajecten.py
: Gets all different train paths and trains of the ns datagraph.py
: Creates a graph based on the paths found in trajecten.py
Daan Vinken, Joël Buter, Ricardo van Aken, Jesse Postema