The fingerprint is non deterministic for events with a nested map
Closed this issue · 2 comments
When an event contains a nested data structure the order that this data is hashed in is not deterministic. A specific use-case for us is, that we add a location to each log event as a nested data structure:
mutate { add_field => { "[site][name]" => "test" } }
mutate { add_field => {
"[site][location][lat]" => "13.35"
"[site][location][lon]" => "4.5"
}
}
fingerprint {
target => "[@metadata][fingerprint]"
method => "SHA1"
key => "FingerPrintSeed"
concatenate_all_fields => true
}
When activating trace logging and monitoring the created string send to the SHA1 algorithm, one can observe different ordering (the dots indicate that this is a substring of the total string):
...|site|{\"name\"=>\"test\", \"location\"=>{\"lat\"=>13.35, \"lon\"=>4.5}}|...
...|site|{\"location\"=>{\"lat\"=>13.35, \"lon\"=>4.5}, \"name\"=>\"test\"}|...
One solution to the above is if the fingerprint can recursively sort all maps that it encounter before concatenating the string. Alternatively it should throw a warning when a map is encountered to warn the user that the fingerprint is non-deterministic.
I would love to contribute on this, but my Ruby skills are very limited, so I am hoping that others can either provide some guidance, or have the necessary skills to implement a fix.
Best Regards
Lasse
Reported similarly here: #39
(related: https://twitter.com/_jakubhajek/status/1194615506583588864)