Keys with '.' causing error on indexing of JSON scan report in Elasticsearch
akniffe1 opened this issue · 3 comments
While working with using elasticsearch as a database for FSF scan reports I noticed an indexing problem that was the result of the key structure in META_PE, shown below:
"Imports": {
"version.dll": [...]
It appears that this will affect any database that is storing the raw report as 'flat' JSON, though I've only tested on Elasticsearch directly.
After discussing with jxb5151 there may be an opportunity to globally sanitize the final scan report before it's written locally--either by the client submitting the scan report to the database, or perhaps as a cleanup function within FSF. The objective here being that module authors probably shouldn't have to account for this.
Currently testing the following architecture:
filebeats --> logstash (with de_dot filter and JSON codec) --> elasticsearch with Kibana visualization
Will submit pull request with configs shortly
A little more digging on this issue uncovered that de_dot does not process any further than the initial level. You can specify sub fields manually, but I don't think that scales well. The only alternative I have seen was what someone posted in this discussion:
https://discuss.elastic.co/t/field-name-cannot-contain/33251/43
Which uses ruby to process '.' recursively. However, the performance impact of this must not be manageable for those doing this at a larger scale. I have not independently tested, but that is what my gut says.
3rd party data sources people may integrate with at some level (like VirusTotal) may have '.' in the key name in the same way META_PE does in its current form. While we can't control how others choose to represent their data, we can modify META_PE at very little cost to us, and eliminate this issue for those using modules included in this build.