/skforecast-datasets

This repository contains datasets used in the skforecast library. It also contains datasets used in related tutorials.

BSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

skforecast-datasets

This repository contains datasets used in the skforecast library. It also contains datasets used in related tutorials.

All datasets included have a sort description as well as the original source. They can be downloaded directly from the repository or by using the fetch_dataset() function from the skforecast library.

from skforecast.datasets import fetch_dataset()
data = fetch_dataset(name="h20")

Datasets

h2o

h2o_exog

fuel_consumption

items_sales

air_quality_valencia

air_quality_valencia_no_missing

website_visits

bike_sharing

  • url: https://raw.githubusercontent.com/skforecast/skforecast-datasets/main/data/bike_sharing_dataset_clean.csv
  • sep: ','
  • index_col: date_time
  • date_format: %Y-%m-%d %H:%M:%S
  • freq: H
  • description: Hourly usage of the bike share system in the city of Washington, D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. The following modifications have been applied to the original data: Renamed columns with more descriptive names, renamed categories of the weather variables, the category of 'heavy_rain' has been combined with that of 'rain', denormalized temperature, humidity and wind variables, 'date_time' variable created and set as index, imputed missing values by forward filling.
  • source: Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894.

bike_sharing_extended_features

  • url: https://raw.githubusercontent.com/skforecast/skforecast-datasets/main/data/bike_sharing_extended_features.csv
  • sep: ','
  • index_col: date_time
  • date_format: %Y-%m-%d %H:%M:%S
  • freq: H
  • description: Hourly usage of the bike share system in the city of Washington, D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. The following modifications have been applied to the original data: Renamed columns with more descriptive names, renamed categories of the weather variables, the category of 'heavy_rain' has been combined with that of 'rain', denormalized temperature, humidity and wind variables, 'date_time' variable created and set as index, imputed missing values by forward filling. Additionally, the dataset was enriched by introducing supplementary features. Additions include calendar-based variables (day of the week, hour of the day, month, etc.), indicators for sunlight, rolling temperature averages, and polynomial features generated from variable pairs. All cyclic variables are encoded using sine and cosine transformations.
  • source: Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894.

australia_tourism

  • url: https://raw.githubusercontent.com/skforecast/skforecast-datasets/main/data/australia_tourism.csv
  • sep: ','
  • index_col: date_time
  • date_format: %Y-%m-%d
  • freq: Q
  • description: Quarterly overnight trips (in thousands) from 1998 Q1 to 2016 Q4 across Australia. The tourism regions are formed through the aggregation of Statistical Local Areas (SLAs) which are defined by the various State and Territory tourism authorities according to their research and marketing needs.
  • source: Wang, E, D Cook, and RJ Hyndman (2020). A new tidy data structure to support exploration and modeling of temporal data, Journal of Computational and Graphical Statistics, 29:3, 466-478, doi:10.1080/10618600.2019.1695624.

uk_daily_flights

wikipedia_visits

vic_electricity

store_sales

bicimad

m4_daily

m4_hourly

ashrae_daily

bdg2_daily

bdg2_hourly