/har_datasets

A repository for unified analysis of accelerometer-based human activity recognition.

Primary LanguageMakefileMIT LicenseMIT

HAR Datasets

This repository aims to provide a unified interface to datasets for the task of accelerometer-based Human Activity Recognition (HAR). The philosophy is to catalogue as many datasets as possible from a wide variety of recording conditions.

terms of data format, feature extraction, label space, sampling frequency, device location/orientation, etc. for the purpose of understanding the efficacy of transfer learning, online learning, lifelong learning, data representation, feature extraction across a large collection of datasets.

Project Structure

This project follows the DataScience CookieCutter template with the aim of facilitating reproducible models and results. the majority of commands are executed with the make command, and we also provide a high-level data loading interface.

Proposed Format

All data will be translated to the following a simple CSV format with the following columns:

time, subject_id, sequence_id, activity_labels, fold_id, x, y, z

where time is in seconds and of the type double, subject_id is an integer identifier of the subjects, sequence_id identifiers of contiguous activities (one subject may therefore perform a task several times), x, y, z are the x, y, and z axis data (whether acceleration, magnetometer, or gyroscope), and activity_labels are the labels of the dataset. Finally, fold_id is an identifier that is used to specify the fold in which the data should appear (negative values will only be used in training, consistent with scikit-learn's PredefinedSplit module).

We have made the decision to keep our data formal relatively simple since we hope it will provide a language-agnostic interface to the data so that users of, for example, R, MATLAB, Python, C++, etc can use the data once it has been built. Datasets with several views into movement (eg with volunteers wearing several devices, or with IMUs providing not only acceleration data, but also gyroscope and magnetometer data) we have made the decision that each 'view' be contained in a separate file since in some cases the data are sampled at different rates. However, the data may be merged together using the subject_id, time, and sequence_id fields of the file above.

Current Datasets

The following table enumerates the datasets accounted for in this repository, sorted by the surname of the first author of the paper.

First Author Dataset Name Paper (URL) Data Description (URL) Data Download (URL) Year fs Accel Gyro Mag #Subjects #Activities Notes
Anguita anguita2013 A Public Domain Dataset for Human Activity Recognition Using Smartphones Description Download 2013 50 yes yes 30 6
Banos banos2012 A benchmark dataset to evaluate sensor displacement in activity recognition Description Download 2012 50 yes yes yes 17 33
Banos banos2015 mHealthDroid: a novel framework for agile development of mobile health applications Description Download 2015 50 yes yes yes 10 12
Barshan barshan2014 Recognizing daily and sports activities in two open source machine learning environments using body-worn sensor units Description Download 2014 25 yes yes yes 8 19
Bruno bruno2013 Analysis of Human Behavior Recognition Algorithms based on Acceleration Data Description Download 2013 32 yes 16 14 Notes
Casale casale2015 Personalization and user verification in wearable systems using biometric walking patterns 2012 52 yes 7 15
Chen utdmhad UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor Description Download 2015 50 yes yes 9 21
Chavarriaga opportunity The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition Description Download 2012 30 yes yes yes 12 7 Several annotation tracks.
Chereshnev hugadb HuGaDB: Human Gait Database for Activity Recognition from Wearable Inertial Sensor Networks Description Download 2017 ~56 yes yes 18 12
Kwapisz wisdm Activity Recognition using Cell Phone Accelerometers Description Download 2012 20 yes 29 6
Micucci micucci2017 UniMiB SHAR: A Dataset for Human Activity Recognition Using Acceleration Data from Smartphones Description Download 2017 50 yes 30 8 Notes
Ortiz ortiz2015 Human Activity Recognition on Smartphones with Awareness of Basic Activities and Postural Transitions Description Download 2015 50 yes yes ? 7 With postural transitions
Reiss pamap2 Introducing a new benchmarked dataset for activity monitoring Description Download 2012 100 yes yes yes 10 12
Shoaib shoaib2014 Fusion of Smartphone Motion Sensors for Physical Activity Recognition Description Download 2014 50 yes yes yes 7 7
Siirtola siirtola2012 Recognizing human activities user-independently on smartphones based on accelerometer data Description Download 2012 40 yes 7 5
Stisen stisen2015 Smart Devices are Different: Assessing and MitigatingMobile Sensing Heterogeneities for Activity Recognition Description Download 2015 50-200 yes 9 6
Sztyler sztyler2016 On-body localization of wearable devices: An investigation of position-aware activity recognition Description Download 2016 50 yes yes yes 15 8 Many other sensors also (video, light, sound, etc)
Twomey spherechallenge The SPHERE Challenge: Activity Recognition with Multimodal Sensor Data Description Download 2016 20 yes 20 20
Ugulino ugulino2012 Wearable Computing: Accelerometers’ Data Classification of Body Postures and Movements Description Download 2012 50 yes 4 5
Vavoulas mobiact The MobiAct Dataset: Recognition of Activities of Daily Living using Smartphones Description Download 2016 100 yes 57 9
Zhang uschad USC-HAD: A Daily Activity Dataset for Ubiquitous Activity Recognition Using Wearable Sensors Description Download 2012 100 yes yes 15 12

Contributing

We will gladly accept contributions to this repository in any form, but particularly we welcome additional datasets, new feature extraction processes, view representations, and bug fixes.

Adding new Datasets

New datasets can be added by contacting me via email or by submitting a new issue (preferred). Simply provide me with information required to populate a new row in the table above. If you have a transformer that will convert the data to the preferred format please attach this too. If not I will then attempt to write a converter for the data but this may take some time.

Update via Pull Request

Two steps must be performed for a Pull Request to be accepted: 1. update the table above; and 2. add the transformer to the repository. These steps are outlined in more detail below:

Update the Table

The table above can be updated by adding a row with the following information:

| AuthorName | DatasetName | [PaperName](PaperURL) | [Description](DescriptionURL) | [Download](DownloadURL) | PublicationYear | SamplingFrequency | HasAccelerometer | HasGyroscope | HasMagnetometer | NumSubjects | NumActivities | Notes |

Please insert the new row alphabetically based on the first author's name and then by publication date if there is a tie. Note, the name of the dataset will be immutable and only in exceptional circumstances will the name be changed.

Add Transformer

A new data transformer should be placed in src/converters/<DatasetName>.py where <DatasetName> matches the second element of the newly inserted row. This file must provide a function called <DatasetName> wich accepts as an argument the Contained within this file should be a function called <DatasetName> which returns pandas dataframes. Using the spherechallenge dataset as an example, a file in src/data/spherechallenge.py will contain the followign:

def spherechallenge(input_path):
    data = load_sphere_challenge_data(input_path)
    return data

It is important that there is consistency between the name of the dataset in the table above, the name of the file in the src directory and the name of the function since the module importer reads the data information from this table and dynamically loads the transformation functions dynamically. In other words, the function must be importable as follows

from spherechallenge import spherechallenge

Adding New Feature Representations

We have implemented several feature extraction processes in the src/features directory and interfaces to map these features to the above datasets also. These should be relatively straightforward to add since they will typically operate on a matrix of acceleration data and will return a vector. As a simple example one may extract the mean, standard deviation, range, min and max values as follows:

import numpy as np

stat_funcs = [np.mean, np.std, np.ptp, np.min, np.max]

def extract_stat_features(data):
	return np.concatenate([func(data, axis=0) for func in stat_funcs])

Adding Transformers

Several pre-processing techniques are often applied to accelerometer data. For example, it is common to separate the 'body' and 'gravity' components from each other, compute the magnitude of the data etc.

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── features       <- The representation of the processed data.
│   ├── processed      <- The intermediate data, transformed to the desired format.
│   └── raw            <- The original, immutable data dump. All datasets have unique
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.testrun.org