PanakoStrategy Query Logic - allow duplicate fingerprint hash?
Opened this issue · 1 comments
Possible minor refactoring to improve the recognition rate.
Testing Result
Playing around with whether a duplicate fingerprint hash is processed produced an unexpected improvement in the recognition rate when duplicate fingerprints are not considered. However, this might not suit all use cases for the query algorithm.
Suggestion
Add a boolean flag to allow duplicate fingerprints or not. See pseudocode below:
//query
for(PanakoFingerprint print : prints) {
long hash = print.hash();
hashNotADuplicate = // add duplicate logic
if(allowDuplicates || hashNotADuplicate) {
db.addToQueryQueue(hash);
}
printMap.put(hash, print);
}
Hi thanks for the suggestion,
The reason for not allowing duplicate hashes is twofold:
If a hash is common it means (almost by definition) that it does not have much discriminative power. The idea implemented here is that they can be safely ignored.
Another reason is performance: not wasting storage space or computation on hashes with little discriminative power. While some hash collisions are allowed having too many could have an effect on query performance.
However, letting users choose would indeed be a good improvement. For small collections or powerful servers the collisions can perhaps be not that big of a problem. Either using a Set (to avoid duplicates) or an Array (to allow) to store temporary prints could be an idea indeed.