/tmd-primer

Demonstrating advantages of sequential ML models over traditional ML 🔥

Primary LanguageJupyter NotebookMIT LicenseMIT

build status codecov

Sequence classification with Neural Networks: a primer

This repository demonstrates the advantages of RNNs and CNNs over traditional ML models on time-series data with outliers.

Task description

We're going to use transport mode detection task as our running example. Given a time series of sensor data, the goal is to classify each time step with one of the predefined transport modes: walk, car, bike, etc.

For demonstration purposes, we will use only two modes, "walk" and "train", so that our task becomes a binary classification task.

Data

The data is generated synthetically based on common sense assumptions. Outliers in the data represent faulty sensor readings, which often happens in real life (wrong geo-positions, acceleration). For simplicity, a single feature representing the speed of a device is used. The Data generation notebook describes the data generation methodology in detail.

List of notebooks

  1. Data generation: Describes the data and outlier generation methodology with examples.
  2. Basic Tree model: Modelling the task using decision trees.
  3. Tree model with multiple time steps: Decision tree that has access to past timesteps.
  4. CNN models: Modelling the task using a Convolutional neural networks
    1. Basic CNN model: Modelling the task using a windowed CNN model
  5. RNN models: Modelling the task using 3 different RNN models.
    1. Per-sample RNN model: Classify each element in a sequence for the whole sequence.
    2. Split-window RNN model: Classify each element in a sequence for the part of sequence (for large sequences).
    3. Split-window stateful RNN model: Keep state between the batches to continue training RNN on large sequences.
    4. Overlapping-window RNN model: Predict last element in a sequence with windows.
  6. RNN advanced topics:
    1. RNN padding and masking: Generating data samples of different sizes. Padding samples in RNN model.
    2. RNN class weights: TODO: Generating data samples with different class proportions. Class weights in the RNN model.
    3. RNN truncated back-propagation: TODO: TODO