sample for predict on test rows

Question

sample for predict on test rows

Closed this issue 3 years ago · 1 comments

yeah that helped

abnormal file
1590959800 | Close Session command failed | head01

I am trying below and expecting predict as fix values.

import torch
from deeplog import DeepLog
from deeplog.preprocessor import Preprocessor

preprocessor = Preprocessor(
length = 20, # Extract sequences of 20 items
timeout = float('inf'), # Do not include a maximum allowed time between events
)

X, y, label, mapping = preprocessor.csv(r'C:\ee18337_deeplog_.csv')
deeplog = DeepLog(
input_size = 300, # Number of different events to expect
hidden_size = 64 , # Hidden dimension, we suggest 64
output_size = 300, # Number of different events to expect
)
deeplog = deeplog.to("cpu")
X = X.to("cpu")
y = y.to("cpu")

#trainig

deeplog.fit(
X = X,
y = y,
epochs = 10,
batch_size = 128,
)

y_pred_normal, confidence = deeplog.predict(
X = X,
k = 3,
)
print(y_pred_normal)
#another file for testing single row to predict single row as anamoly
X1, y1, label, mapping = preprocessor.csv(r'C:\out_directory_1\abnormal_routinne.csv')
X1 = X1.to("cpu")
y1 = y1.to("cpu")
#print("++++++++",y1)
y_pred_abnormal, confidence = deeplog.predict(
X = X1,
k = 3,
)
print("predicted output---> ",y_pred_normal)

anomalies_normal = ~torch.any(
y_pred_abnormal.T == y_pred_normal.T,
dim = 0,
)
print(f"False positives: {anomalies_normal.sum() / anomalies_normal.shape[0]}")

I am expecting we would be able to predict single row and say its anomaly. but this predicted output---> showing me every time different values.
here my dimension are not matching because my abnormal routine file contain only 1 row. do we always have to keep abnormal routine dimension matching with normal ones dimension?

Answer 1 · 2022-04-20T09:13:28.000Z

Hi krishna213,

I think that the main issue here is that we need to train for more epochs, as we have very little data to learn from.
Using your example, I changed the number of epochs from 10 to 100 and added some additional printing to show which values are predicted for both the normal and abnormal case.

Please note that in this case, when predicting abnormal values, we tend to predict the correct value. This is because the predicted value is still in the top 3 most likely values. Usually DeepLog works better when trained and tested with very large datasets. But then again, for the details of DeepLog, please contact the original authors of the paper. I merely provided an implementation that we used to compare against our own work DeepCASE.

import torch
from deeplog import DeepLog
from deeplog.preprocessor import Preprocessor

preprocessor = Preprocessor(
    length = 20, # Extract sequences of 20 items
    timeout = float('inf'), # Do not include a maximum allowed time between events
)

X, y, label, mapping = preprocessor.csv('ee18337_deeplog_.csv')
deeplog = DeepLog(
    input_size = 300, # Number of different events to expect
    hidden_size = 64 , # Hidden dimension, we suggest 64
    output_size = 300, # Number of different events to expect
)

deeplog = deeplog.to("cpu")
X = X.to("cpu")
y = y.to("cpu")

#trainig

deeplog.fit(
    X = X,
    y = y,
    epochs = 100, # Train for a longer time, with so few samples we need to train a bit longer
    batch_size = 128,
)

y_pred_normal, confidence = deeplog.predict(
    X = X,
    k = 3,
)

# Added extra printing
print("predicted normal output---> ", y_pred_normal)
# Show in terms of actual predicted values
for index, row in enumerate(y_pred_normal):
    print("Prediction y_pred_normal most->least likely: ", ', '.join(mapping.get(x.item(), 'UNKNOWN') for x in row))

#another file for testing single row to predict single row as anamoly
X1, y1, label, mapping = preprocessor.csv('abnormal_routinne.csv')
X1 = X1.to("cpu")
y1 = y1.to("cpu")
#print("++++++++",y1)
y_pred_abnormal, confidence = deeplog.predict(
    X = X1,
    k = 3,
)

# Added extra printing
print("predicted abnormal output---> ",y_pred_abnormal) # Show predicted abnormal instead

anomalies_normal = ~torch.any(
    y == y_pred_normal.T, # We want to compare the normal values with the actual next value, not with the abnormal next value
    dim = 0,
)

anomalies_abnormal = ~torch.any(
    y1 == y_pred_abnormal.T, # We want to compare the normal values with the actual next value, not with the abnormal next value
    dim = 0,
)

print(f"False positives   normal: {anomalies_normal  .sum() / anomalies_normal.  shape[0]}")
print(f"True  positives abnormal: {anomalies_abnormal.sum() / anomalies_abnormal.shape[0]}")