The data is from a Kaggle competition Tabular Playground Series April 2022. The multivariate time series consists of 13 time series, each of which has 60 data points.
This is a binary classification problem, so we need to predict the probability of an input being in group 1.
I tried various approaches, both in the feature extraction and in the modelling stages.
- Feature extraction
- ROCKET
- Shapelets
- None
- Modelling
- Gradient Boosting
- Neural Network (the basic, a fully connected layer)
- LSTM
- Logistic regression
Three notebooks are included in this repository. The first two notebooks have their own repo, but I also put them here to complete the collection.
- tps_apr2022_rocket focus on ROCKET as a feature extractor. The video is here
- shapelet_tslearn focus on Shapelets to transform the dataset. The video is here. Note that I didn't set the seed for learning shapelets, so the shapelets visualizations in this notebook may look a little bit different than what you see in the video. But that doesn't affect the model performance.
- The video for models with low AUC is here. In that video I also share my opinion about what we can learn from this project.