moloch--/leakdb

"source" field in normalized JSON?

darrenmartyn opened this issue · 5 comments

Would it be feasible to add a "source" field to the JSON/indexed data, so you could "tag" entries as being from certain leaks.

This could be very useful when trying to go back later and attribute where a piece of data came from - but unsure if it would have performance impacts?

I don't think it would have much of an impact on performance, most of the code operates on lines not the actual content of the line, so there's little code that would need to change too. A few other folks have been asking for something like this so I'll probably look at adding it. It would affect the bloom filter's ability to effectively de-duplicate identical user/password combos since they'd be from different sources, so there'd could be a modest impact to index/sort times but i don't think there'd be a large impact to search times.

Any news on this feature request?

Not had time to work on it yet sorry!

Maybe, most of the code only cares about "lines" in a file, you'd have to extend the normalizer to add a "source" field to the JSON format, and extend the few parts of the code that parse the JSON to optionally deal with the extra field.