Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series (Powered by PyTorch).
Transformer are attention based neural networks designed to solve NLP tasks. Their key features are:
- linear complexity in the dimension of the feature vector ;
- paralellisation of computing of a sequence, as opposed to sequential computing ;
- long term memory, as we can look at any input time sequence step directly.
This repo will focus on their application to times series.
Our use-case is modeling a numerical simulator for building consumption prediction. To this end, we created a dataset by sampling random inputs (building characteristics and usage, weather, ...) and got simulated outputs. We then convert these variables in time series format, and feed it to the transformer.
In order to perform well on time series, a few adjustments had to be made:
- The embedding layer is replaced by a generic linear layer ;
- Original positional encoding are removed. A "regular" version, better matching the input sequence day/night patterns, can be used instead ;
- A window is applied on the attention map to limit backward attention, and focus on short term patterns.
All required packages can be found in requirements.txt
, and expect to be run with python3.7
. Note that you may have to install pytorch manually if you are not using pip with a Debian distribution : head on to PyTorch installation page. Here are a few lines to get started with pip and virtualenv:
$ apt-get install python3.7
$ pip3 install --upgrade --user pip virtualenv
$ virtualenv -p python3.7 .env
$ . .env/bin/activate
(.env) $ pip install -r requirements.txt
The dataset is not included in this repo, and must be downloaded manually. It is comprised of two files, dataset.npz
contains all input and outputs value, labels.json
is a detailed list of the variables. Please refer to #2 for more information.
Using jupyter, run the default training.ipynb
notebook. All adjustable parameters can be found in the second cell. Careful with the BATCH_SIZE
, as we are using it to parallelize head and time chunk calculations.
The Transformer
class can be used out of the box, see the docs for more info.
from tst import Transformer
net = Transformer(d_input, d_model, d_output, q, v, h, N, TIME_CHUNK, pe)
To build the doc:
(.env) $ cd docs && make html