Drain does not extract text patterns?
vikramriyer opened this issue · 1 comments
Thanks for putting this together team. I have been trying to use Drain
algorithm specifically and came across this issue.
'user=mike ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=mike ip=unknown-ip-addr cmd=Shutting down the object store',
'user=smith ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=smith ip=unknown-ip-addr cmd=Shutting down the object store',
'user=jackson ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=jackson ip=unknown-ip-addr cmd=Shutting down the object store',
'user=bob ip=unknown-ip-addr cmd=Metastore shutdown complete',
'user=bob ip=unknown-ip-addr cmd=Shutting down the object store'
So, ideally, the patterns look similar i.e. of the form
user=<*> ip=<*> cmd=<*>
But, the drain
algorithm does not pick this up. I have tried with several params of sim_th, depth, and max_children.
It does pick up user
and masks it but fails for ip
and cmd
and other similar text which might not be part of any dictionary. Is there a way to mask this automatically other than writing regex?
Am I missing something? Can someone help?
Our current implementation splits a log message by white space as follows:
logparser/logparser/Drain/Drain.py
Line 260 in 244005f
You could try to modify this line to add "=" as a splitter, or preprocess the log messages beforehand.