Thijsvanede/DeepLog

Variable length inputs

Closed this issue · 1 comments

Hello,

first of all, thank you for your work.

My problem is that I have sequences of variable length. They are stored as a list of lists, containing the log keys.
The model fitting works, but if I want to make predictions on the test sequences, it throws an error.
This is the relevant code snippet:

y_pred_normal, confidence = model.predict(x_test, k=5, variable=True)

normal = ~torch.any(
    y_test == y_pred_normal.T,
    dim = 0,
)

x_test is a list containing 243 sequences stored as lists. When applying the predict method, it outputs y_pred_normal with shape [243,5], which means that it contains the top 5 predictions for each sequence only once and not for each log key of each sequence.

When applying the comparison with y_test, which is a one-dimensional tensor containing all log keys in order, it of course throws an exception RuntimeError: The size of tensor a (43986) must match the size of tensor b (243) at non-singleton dimension 1, because the dimensions are not equal.

My understanding is that the VariableDataLoader of torchtrain handles the conversion of the list of lists to a tensor in the fit and predict methods. Am I wrong in this assumption? Do I have to convert my sequences from lists into tensors before giving them to the model?

I hope you can help me with my problem.

Best regards

I was able to address the issue myself by not using variable lengths but fixed lengths as provided in the examples. I set the length to the maximum length of the training sequences and with that the model works as expected.