Thijsvanede/DeepLog

sample for predict on test rows

Closed this issue · 1 comments

yeah that helped

for according to this example predict will state 1 or 2 but when i ran this it was giving [127], [127], [127] and everytime this is changing. I was also expected 1 or 2. why in this this case it was giving random state?
what does hdfs_test_abnormal data covers? how to prepare it? I dont have abnormal routine dataset
csv-->
timestamp | event | machine
1590949800 | Get SDR 0034 command failed | head01
1590949800 | Get SDR 0034 command failed | head01
1590939800 | Close Session command failed | head01
1590959800 | Close Session command failed | head01
1590959800 | Watchdog | EBh | ok | 6.1 | | head01

abnormal file
1590959800 | Close Session command failed | head01

I am trying below and expecting predict as fix values.

import torch
from deeplog import DeepLog
from deeplog.preprocessor import Preprocessor

preprocessor = Preprocessor(
length = 20, # Extract sequences of 20 items
timeout = float('inf'), # Do not include a maximum allowed time between events
)

X, y, label, mapping = preprocessor.csv(r'C:\ee18337_deeplog_.csv')
deeplog = DeepLog(
input_size = 300, # Number of different events to expect
hidden_size = 64 , # Hidden dimension, we suggest 64
output_size = 300, # Number of different events to expect
)
deeplog = deeplog.to("cpu")
X = X.to("cpu")
y = y.to("cpu")

#trainig

deeplog.fit(
X = X,
y = y,
epochs = 10,
batch_size = 128,
)

y_pred_normal, confidence = deeplog.predict(
X = X,
k = 3,
)
print(y_pred_normal)
#another file for testing single row to predict single row as anamoly
X1, y1, label, mapping = preprocessor.csv(r'C:\out_directory_1\abnormal_routinne.csv')
X1 = X1.to("cpu")
y1 = y1.to("cpu")
#print("++++++++",y1)
y_pred_abnormal, confidence = deeplog.predict(
X = X1,
k = 3,
)
print("predicted output---> ",y_pred_normal)

anomalies_normal = ~torch.any(
y_pred_abnormal.T == y_pred_normal.T,
dim = 0,
)
print(f"False positives: {anomalies_normal.sum() / anomalies_normal.shape[0]}")

I am expecting we would be able to predict single row and say its anomaly. but this predicted output---> showing me every time different values.
here my dimension are not matching because my abnormal routine file contain only 1 row. do we always have to keep abnormal routine dimension matching with normal ones dimension?

Hi krishna213,

I think that the main issue here is that we need to train for more epochs, as we have very little data to learn from.
Using your example, I changed the number of epochs from 10 to 100 and added some additional printing to show which values are predicted for both the normal and abnormal case.

Please note that in this case, when predicting abnormal values, we tend to predict the correct value. This is because the predicted value is still in the top 3 most likely values. Usually DeepLog works better when trained and tested with very large datasets. But then again, for the details of DeepLog, please contact the original authors of the paper. I merely provided an implementation that we used to compare against our own work DeepCASE.

import torch
from deeplog import DeepLog
from deeplog.preprocessor import Preprocessor

preprocessor = Preprocessor(
    length = 20, # Extract sequences of 20 items
    timeout = float('inf'), # Do not include a maximum allowed time between events
)

X, y, label, mapping = preprocessor.csv('ee18337_deeplog_.csv')
deeplog = DeepLog(
    input_size = 300, # Number of different events to expect
    hidden_size = 64 , # Hidden dimension, we suggest 64
    output_size = 300, # Number of different events to expect
)

deeplog = deeplog.to("cpu")
X = X.to("cpu")
y = y.to("cpu")

#trainig

deeplog.fit(
    X = X,
    y = y,
    epochs = 100, # Train for a longer time, with so few samples we need to train a bit longer
    batch_size = 128,
)

y_pred_normal, confidence = deeplog.predict(
    X = X,
    k = 3,
)

# Added extra printing
print("predicted normal output---> ", y_pred_normal)
# Show in terms of actual predicted values
for index, row in enumerate(y_pred_normal):
    print("Prediction y_pred_normal most->least likely: ", ', '.join(mapping.get(x.item(), 'UNKNOWN') for x in row))

#another file for testing single row to predict single row as anamoly
X1, y1, label, mapping = preprocessor.csv('abnormal_routinne.csv')
X1 = X1.to("cpu")
y1 = y1.to("cpu")
#print("++++++++",y1)
y_pred_abnormal, confidence = deeplog.predict(
    X = X1,
    k = 3,
)

# Added extra printing
print("predicted abnormal output---> ",y_pred_abnormal) # Show predicted abnormal instead

anomalies_normal = ~torch.any(
    y == y_pred_normal.T, # We want to compare the normal values with the actual next value, not with the abnormal next value
    dim = 0,
)

anomalies_abnormal = ~torch.any(
    y1 == y_pred_abnormal.T, # We want to compare the normal values with the actual next value, not with the abnormal next value
    dim = 0,
)

print(f"False positives   normal: {anomalies_normal  .sum() / anomalies_normal.  shape[0]}")
print(f"True  positives abnormal: {anomalies_abnormal.sum() / anomalies_abnormal.shape[0]}")