Input format
Closed this issue · 3 comments
Hello,
I am wondering about the required input format. preprocessor.py says that the csv fields timestamp
, machine
, event
and optionally label
are required.
Would it be possible to add an example input csv file which demonstrates the meaning of each of these fields?
For me, the meaning of the fields timestamp and event is clear. Much less the meaning of the field machine, since the original paper on DeepLog does not distinguish between logs originating from different machines.
Also, I am missing fields for log parameter values. Are these somehow included in the event
field or does this implementation not support the parameter value anomaly detection model as described in the original paper?
I'd be thankful for an answer.
Best regards, Juri
Hi Juri,
Indeed, DeepLog does not necessarily distinguish between events produced by different machines. The reason this was added was to be more consistent with our own work on DeepCASE. Basically the machine
value allows you to specify a unique identifier for each machine producing an event. This can be an IP address, MAC address, or any string value that you wish to identify the machine. If you do not want to distinguish between machines, i.e., run the analysis over all events of all machines simultaneously, you can give each entry in the machine
column the same value or leave it blank.
This DeepLog repository only implements the log key anomaly detection of DeepLog and therefore does not support the parameter value anomaly detection model. Should you wish to implement something and add it to this repository than I am of course happy to accept pull requests.
Thanks for you reply!
Am I right in assuming the following format for the input:
timestamp,event,machine
1638060458,280,head01
1638060458,280,head01
1638060458,281,head01
1638060458,281,head01
1638060458,282,head01
given the following logfile:
1638060458 Get SDR 0034 command failed
1638060458 Get SDR 0034 command failed
1638060458 Close Session command failed
1638060458 Close Session command failed
1638060458 Watchdog | EBh | ok | 6.1 |
I.e. the event is just some arbitrary identifier for any given event type?
Yes that is indeed correct, in the example that you are providing:
280
is the identifier forGet SDR 0034 command failed
281
is the identifier forClose Session command failed
282
is the identifier forWatchdog | EBh | ok | 6.1 |
- All logs in the logfile originiate from the machine
head01
So you can just add an arbitrary identifier for any given event type. And please note that you can also simply use a string as an identifier. I.e., for the given logfile from your example, following csv file would also be correct:
timestamp,event,machine
1638060458,Get SDR 0034 command failed,head01
1638060458,Get SDR 0034 command failed,head01
1638060458,Close Session command failed,head01
1638060458,Close Session command failed,head01
1638060458,Watchdog | EBh | ok | 6.1 |,head01
Example in code
The DeepCASE preprocessor will internally map all unique identifiers to a numerical representation that is accessible via the mapping
variable returned by the preprocessor.csv() method. (I just saw that the mapping variable was not documented in the return of csv, so I just updated the documentation as well :))
# DeepCASE Imports
from deepcase.preprocessing import Preprocessor
if __name__ == "__main__":
########################################################################
# Loading data #
########################################################################
# Create preprocessor
preprocessor = Preprocessor(
length = 5, # 5 events in context for this example
timeout = 86400, # Ignore events older than 1 day (60*60*24 = 86400 seconds)
)
# Load data from file
context, events, labels, mapping = preprocessor.csv('path/to/example/given/above.csv')
print(context)
print(events)
print(labels)
print(mapping)
This will give the following output:
Context:
tensor([[3, 3, 3, 3, 3],
[3, 3, 3, 3, 1],
[3, 3, 3, 1, 1],
[3, 3, 1, 1, 0],
[3, 1, 1, 0, 0]])
Events:
tensor([1, 1, 0, 0, 2])
Labels:
None
Mapping:
{0: 'Close Session command failed', 1: 'Get SDR 0034 command failed', 2: 'Watchdog | EBh | ok | 6.1 |', 3: -1337}
Where the number 3
, means NO CONTEXT EVENT AVAILABLE
.