baraline/convst

Data Input for convst

aron-alarik opened this issue · 4 comments

Data Input for convst

Hi, do you still have a issue ? If not I will close this thread.
Please be advised that a huge update will soon be published (near Febuary) modifying a lot of parts of the current code. Try not using this version into a final product or anything too "important".

Hi Sorry for the delay, I was not sure if this was the right place to ask, but here goes: really excited about this library, want to use it in our production code when it becomes available, but not sure about how to format the input data. Also is this library helpful in shapelet discovery ? ie. creating a database of unique shapelets (occurring under the influence of different external factors).

Currently we have two different sets of the data: the raw time series (single sensor data) and features AND the second set, which is a pre-discovered set of shapelets containing location and distance information. Some inputs on the type of Input DataFrame you are expecting will be helpful. Thanks !!

Concerning shapelet discovery, it would depend on the usage you are planning. Shapelets parameter (length and dilation) are drawn randomly as a function of the input length, but their values are not totally random as we try to estimate the location of the discriminant information between classes.
So if you want to use it to make sense of the model you learned I don't see any issue, for other uses, you might want to first establish if this "randomness" could have a negative impact. I would need a bit more detail on your task to provide guidance on this one.
The case of shapelet discovery could nevertheless be interesting to include as a kind of human guided search with a GUI in future works on this project.

About the input data format, internally the algorithm work on a 3D numpy array of shape (n_samples, n_features, n_timestamps). So for your raw sensor data, we would take (n_samples, 1, n_timestamps). Depending on the dataframe format you are using, you could also pass a dataframe as input, as it should normally be converted to this format by the check_array_3D(X, is_univariate=True) function.
How did you format your dataframe ? Do you have any error currently by using the library ? If so, give some detail here so i can work on a fix.

Note that in the future version, the changes made to the algorithm will allow us to support multivariate and uneven length time series, along with a supervised or non supervised shapelet extraction. But this code currently only support univariate and even length time series in a supervised context.

Hi, thank you so much, this was extremely insightful and helpful, I am going to follow up with few experiments and come back to you with the results... cheers !!