logpai/logparser

Drain does not extract text patterns?

vikramriyer opened this issue · 1 comments

Thanks for putting this together team. I have been trying to use Drain algorithm specifically and came across this issue.

'user=mike ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=mike ip=unknown-ip-addr cmd=Shutting down the object store',
'user=smith ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=smith ip=unknown-ip-addr cmd=Shutting down the object store',
'user=jackson ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=jackson ip=unknown-ip-addr cmd=Shutting down the object store',
'user=bob ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=bob ip=unknown-ip-addr cmd=Shutting down the object store'

So, ideally, the patterns look similar i.e. of the form

user=<*> ip=<*> cmd=<*>

But, the drain algorithm does not pick this up. I have tried with several params of sim_th, depth, and max_children.
It does pick up user and masks it but fails for ip and cmd and other similar text which might not be part of any dictionary. Is there a way to mask this automatically other than writing regex?

Am I missing something? Can someone help?

Our current implementation splits a log message by white space as follows:

logmessageL = self.preprocess(line['Content']).strip().split()

You could try to modify this line to add "=" as a splitter, or preprocess the log messages beforehand.