ARK-Builders/ARK-Navigator

Duplicates detection

Opened this issue · 3 comments

Right now, if a user has several copies of the same file (with the some content and id), only one file is displayed.
In case of deletion, only one copy is deleted and one of other copies is displayed next time.
Tags are stored for all copies at the same time due to usage of content-addressing.

Most likely, several copies of the same resource is a user's mistake. But it also might happen intentionally as back-up mechanism. Given that the app is supposed to be used in kinda distributed setup (using external syncing app like Syncthing) such a back-up should be redundant.

It seems like a good idea to detect such duplicates and present information about them to the user as counter on single item.
The [delete] button should be replaced by two then: [delete copies] and [delete all]. Maybe it would be simpler to suggest to the user to remove duplicates if such were found. Or preference may be added to the settings screen.

Sketch of a prompted deletion scree
Screenshot 2021-11-20 at 16 07 24

Duplicates detection appears to be more important than it seems since duplicates can be false duplicates: two or more resources might have the same id due to collisions of hash function used. Right now, we don't use cryptographical hash functions due to performance reasons. We use CRC32 function and it might have pretty high probability of collisions.

Open questions:

  1. Probability of collisions for CRC32 function.
  2. How to make false duplicates to look like different resources to us?
  • We could group the items with the same id and show the number of items under that group as a badge.
  • We could make it so that the user can tap on it and see the details of the duplicate files, like the file name, location, etc.
  • We could also have a button inside that view for the user to decide that some file is not a duplicate or that it should be treated as a separate file and then we change the id of that item so that it shows as a separate item.
  • This would cover the use case that the user made a duplicate on purpose and wants to tag the duplicate item differently from the original item.
  • This would also cover the use case that the user detected a duplicate item that isn't actually a duplicate and they want to ungroup it from the other duplicates