/data_airticket

dataset for air line ticket price

Primary LanguagePythonMIT LicenseMIT

Airline ticket price dataset

There are two airline ticket price datasets. Dataset1 contains 8 routes, dataset2 contains 12 routes as shown following:

dataset1 = ["BCN_BUD",  # route 1  Barcelona -> Budapest
          "BUD_BCN",  # route 2    Budapest -> Barcelona
          "CRL_OTP",  # route 3    Brussels -> Bucharest
          "MLH_SKP",  # route 4    Mulhouse -> Skopje
          "MMX_SKP",  # route 5    Sweden -> Skopje
          "OTP_CRL",  # route 6    Bucharest -> Brussels
          "SKP_MLH",  # route 7    Skopje -> Mulhouse
          "SKP_MMX"]  # route 8    Skopje -> Sweden
dataset2 = ["BGY_OTP", # route 1  Milan -> Bucharest
            "BUD_VKO", # route 2  Budapest -> Moscow
            "CRL_OTP", # route 3  Brussels -> Bucharest
            "CRL_WAW", # route 4  Brussels -> Warsaw
            "LTN_OTP", # route 5  London -> Bucharest
            "LTN_PRG", # route 6  London -> Prague
            "OTP_BGY", # route 7  Bucharest -> Milan
            "OTP_CRL", # route 8  Bucharest -> Brussels
            "OTP_LTN", # route 9  Bucharest -> London
            "PRG_LTN", # route 10 Prague -> London
            "VKO_BUD", # route 11 Moscow -> Budapest
            "WAW_CRL"] # route 12 Warsaw -> Brussels

All the files contained in the datasets are in json format. And the data are observed from November 09, 2015 to May 09, 2016.

Sample data entry:

the query data is the folder name
|-"ArrivalStationCode":"BUD"     # code for the arrival station 
|-"CurrentDate":"30\/12\/2015"   # departure date
|-"Date":"20151230"              # departure date
|-"DepartureStationCode":"BCN"   # code for departure station
|-"Flights":[{
    |-"CarrierCode":"W6"         # NA
    |-"FlightNumber":"2376"      # flight number
    |-"STD":"09:15"              # departure time
    |-"STA":"11:50"              # arrival time
    |-"ArrivalStationName":"Budapest"             # the city name for the arrival station
    |-"DepartureStationName":"Barcelona El Prat"  # the city name for the departure station
    |-"IsMACStation":"True"                       # NA
    |-"IsAirportChange":"False"}]                 # indicator whether need to change (all false in the dataset)
|-"HasSelection":"True"
|-"InMonth":"True"
|-"MinimumPrice":"€49.99"                         # price

Sample feature extraction code in python

|-sample_data_extraction.py

Extracted feature 1

feature 1: flight number -> dummy variables
feature 2: departure date interval from "20151109", because the first observed date is 20151109
feature 3: observed days before departure date
feature 4: minimum price before the observed date
feature 5: maximum price before the observed date
output: the date which has the minimum price from the observed date to departure date is set to 1;
        other entries set to 0

Extracted feature 2

feature 1: flight number -> dummy variables
feature 2: departure date interval from "20151109", because the first observed date is 20151109
feature 3: observed days before departure date
feature 4: minimum price before the observed date
feature 5: maximum price before the observed date
output: the ticket price for each entry

Announcement

This dataset is only used for research purpose, you cannot publish for commercial usage. All rights reserved.

Citation

@Misc{jun2016airdata,
  author =   {Jun Lu},
  title =    {Dataset for airline ticket price},
  howpublished = {\url{https://github.com/lujunzju/data_airticket}},
  year = {since 2016}
}

Lincense

MIT