Train on custom dataset
khabalghoul opened this issue · 1 comments
Hi! How are you?
I found that tsdiff could be a great tool for generating eeg data. I have a dataset containing the channels measurements from an eeg obtained in an experiment and I would like to train your model with this data. How should I do in order to train your model with a custom dataset?
Thanks!
Hi @tomyjara!
You can use something like this to build a custom dataset.
-
Create a JSON lines file with your time series data. Basically every line has one time series in JSON format with two keys,
start
(the start time stamp) andtarget
(the actual time series). I have attached an example file. Note that the time series are not required to have the same start or length. -
Use this function to load the file as a GluonTS dataset.
from pathlib import Path
from gluonts.dataset.split import split
from gluonts.dataset.common import (
MetaData,
TrainDatasets,
FileDataset,
)
def get_custom_dataset(
jsonl_path: Path,
freq: str,
prediction_length: int,
split_offset: int = None,
):
"""Creates a custom GluonTS dataset from a JSONLines file and
give parameters.
Parameters
----------
jsonl_path
Path to a JSONLines file with time series
freq
Frequency in pandas format
(e.g., `H` for hourly, `D` for daily)
prediction_length
Prediction length
split_offset, optional
Offset to split data into train and test sets, by default None
Returns
-------
A gluonts dataset
"""
if split_offset is None:
split_offset = -prediction_length
metadata = MetaData(freq=freq, prediction_length=prediction_length)
test_ts = FileDataset(jsonl_path, freq)
train_ts, _ = split(test_ts, offset=split_offset)
dataset = TrainDatasets(metadata=metadata, train=train_ts, test=test_ts)
return dataset
- This
get_custom_dataset
can be used as a replacement for - Modify the default config appropriately, especially the context length, lags, etc.
Thanks @marcelkollovieh for helping with the response!