logstash-plugins/logstash-filter-fingerprint

The fingerprint is non deterministic for events with a nested map

Closed this issue · 2 comments

When an event contains a nested data structure the order that this data is hashed in is not deterministic. A specific use-case for us is, that we add a location to each log event as a nested data structure:

mutate { add_field => { "[site][name]" => "test" } }
mutate { add_field => {
    "[site][location][lat]" => "13.35"
    "[site][location][lon]" => "4.5"
    }
}
fingerprint {
	target => "[@metadata][fingerprint]"
	method => "SHA1"
	key => "FingerPrintSeed"
	concatenate_all_fields => true
}

When activating trace logging and monitoring the created string send to the SHA1 algorithm, one can observe different ordering (the dots indicate that this is a substring of the total string):

...|site|{\"name\"=>\"test\", \"location\"=>{\"lat\"=>13.35, \"lon\"=>4.5}}|...
...|site|{\"location\"=>{\"lat\"=>13.35, \"lon\"=>4.5}, \"name\"=>\"test\"}|...

One solution to the above is if the fingerprint can recursively sort all maps that it encounter before concatenating the string. Alternatively it should throw a warning when a map is encountered to warn the user that the fingerprint is non-deterministic.

I would love to contribute on this, but my Ruby skills are very limited, so I am hoping that others can either provide some guidance, or have the necessary skills to implement a fix.

Best Regards
Lasse

Hey @lassebv :)

I had precisely the same issue and also debugging the issue was pretty similar to what you did. I actually went on and wrote a patch for this and proposed it in #41. Feel free to check it out. I am also not fluent in Ruby but for working with Logstash I had to learn some of it.