tech-greedy/singularity

Duplicate files in scanning requests

Closed this issue · 6 comments

Describe the bug
When scanning large dataset's -> after a machine restart / singularity restart it seems singularity is creating duplicate set's of generation ID's with the same files in it.

Version
Latest

We have noticed duplicate pieceCids and dataCid in the generationrequests table of the singularity database.
The _id's are however unique.
Checking the _id's in the inputfilelists and outputfilelists shows a list of duplicate files in both tables.

Is this project still alive?

Yes, we haven't gotten to this yet.

I'm guessing in the mean while better delete and re-scan the whole thing after a restart.

We are talking about a lot of data here .... 100's of TB's of data.

Can this please get some more attention?

Scherm­afbeelding 2022-12-30 om 21 37 09

The major root cause of this issue has been fixed in the latest release
https://github.com/tech-greedy/singularity/releases/tag/v2.1.1