panic: unexpected ID string
Opened this issue · 15 comments
I'm trying to restore some corrupted backup files and since par2cmd doesn't seem to work (see this bug report ) I've tried par2go, however the process fails with an error:
andrea@marcopolo:~/par2-repair$ ~/go/bin/par v /mnt/casa/storage/1/Full-0005 [0] Loading volume file "/mnt/casa/storage/1/Full-0005" failed: unexpected ID string panic: unexpected ID string goroutine 1 [running]: main.main() /home/andrea/go/src/github.com/akalin/gopar/cmd/par/main.go:426 +0xf7c
Can you pass in the path to the .par2 file directly, e.g. ~/go/bin/par v /mnt/casa/storage/1/Full-0005.par2
? Currently the par1 vs. par2 detection is very dumb and will treat it as par1 unless the file ends in .par2 (which is what's causing the error).
I'd be curious to see if gopar runs into a similar problem as par2cmdline did!
I've run a test and the result is the same as with par2cmdline: the file is not repaired and, to be more precise, the "repaired" file is the same as the original Full-0005 file.
Here is the (cut down) log of the repair operation:
Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes) Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0 Loaded creator packet with client ID "Created by par2cmdline version 0.7.4." Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd) [1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses) Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687 Loaded recovery packet: exponent=0, byte count=9663672 [1] Loaded volume file "Full-0005.vol000+001.par2" Loaded recovery packet: exponent=1, byte count=9663672 Loaded recovery packet: exponent=2, byte count=9663672 [2] Loaded volume file "Full-0005.vol001+002.par2" Loaded recovery packet: exponent=3, byte count=9663672 Loaded recovery packet: exponent=4, byte count=9663672 Loaded recovery packet: exponent=5, byte count=9663672 Loaded recovery packet: exponent=6, byte count=9663672 [3] Loaded volume file "Full-0005.vol003+004.par2" Loaded recovery packet: exponent=7, byte count=9663672 Loaded recovery packet: exponent=8, byte count=9663672 Loaded recovery packet: exponent=9, byte count=9663672 Loaded recovery packet: exponent=10, byte count=9663672 Loaded recovery packet: exponent=11, byte count=9663672 [.....] Loaded recovery packet: exponent=105, byte count=9663672 Loaded recovery packet: exponent=106, byte count=9663672 Loaded recovery packet: exponent=107, byte count=9663672 Loaded recovery packet: exponent=108, byte count=9663672 Loaded recovery packet: exponent=109, byte count=9663672 Loaded recovery packet: exponent=110, byte count=9663672
Is the log supposed to end like this, without a message of "repair terminated" or something?
Hmm, the I/O for gopar is quite unoptimized, since it just loads everything into memory, so it's entirely possible it's just running out of memory and getting killed. I'll take a closer look...
One thing you can try: it seems it's running out of memory when trying to load all of the recovery packets, but you only need one. If you move all but one or two recovery file out of the way, perhaps it can make further progress?
I noticed that it needs a lot of memory ;) but if it runs out of memory usually either the process is killed and the system reports it, or the system hangs. I'll try with only one file and see what happens.
Here are the results of the test with only one file:
Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes) Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0 Loaded creator packet with client ID "Created by par2cmdline version 0.7.4." Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd) [1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses) Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687 Loaded recovery packet: exponent=3, byte count=9663672 Loaded recovery packet: exponent=4, byte count=9663672 Loaded recovery packet: exponent=5, byte count=9663672 Loaded recovery packet: exponent=6, byte count=9663672 [1] Loaded volume file "Full-0005.vol003+004.par2" Repair error: hash mismatch in reconstructed data
I've also tried with vol000+001 and vol001+2 with the same result.
Does that tells you something?
Hunh, interesting. So par2 stores the md5 hash of every file in the set, and that error is returned whenever the md5 of the reconstructed file still doesn't match the stored hash. This seems consistent with the behavior of par2cmdline in the other bug.
In this case, gopar refuses to write out the reconstructed file, whereas it looks like par2cmdline writes it out anyway.
I could perhaps add more output saying what the expected vs. actual hashes are, and also maybe add a switch to force writing out the reconstructed file even if the hashes don't match. Then you can perhaps compare the output of gopar vs. the output of par2cmdline to see if they match (which I expect), and that would mostly rule out a bug in par2cmdline's reconstruction. What do you think?
One more question: did you run gopar on the original corrupted file, or on the output of par2cmdline's reconstruction?
Yes, more output is welcome :) also, if I understand correctly, gopar overwrites the original file with the corrected data: maybe you could make it write a completely new file instead, that way we can rule out any filesystem issue in overwriting the file.
PS: the original file is identical to the one reconstructed by par2cmdline (same md5sum).
Ok, I added some more output -- I put it in the issue5-debug branch, so please compile from that. I also made it so it writes files to <filename>.recovered
.
Oh, so par2cmdline wrote out a new repaired file, but it's identical to the original file? Interesting...one theory that came to mind is this sequence of events during the creation of the par2 files:
- par2cmdline scans
Full-0005
to compute its MD5 hash and the other block checksums that par2 computes. - Something else comes along and modifies the block between bytes 18148376016 to 18158039687.
- par2cmdline computes the recovery data, but using the modified file.
- par2cmdline writes out the par2 data with the old hashes but the new recovery data.
Then any subsequent run of par2cmdline/gopar would detect the corrupted block, but recovery would simply reconstruct the corrupted block.
Anyway, that's just a theory, but it seems to fit with all the observed behaviors. Unfortunately, if that's the explanation I don't think there's any practical way to recover the original data without knowing the structure of the file. :( I'll sleep on it and see if I can come up with another explanation that's more hopeful...
par2cmdline overwrites the original file, but in the end the repaired file is not changed. I'm not sure I understood your theory. Note also that this is not the only file where recovery fails.
Anyway, here is the results: please check also the preliminary steps to confirm I'm correctly running all the modified sources:
andrea@marcopolo:~/go/src/github.com/akalin/gopar$ git pull warning: L'esecuzione di un pull senza specificare come riconciliare branch divergenti non è consigliata. È possibile sopprimere questo messaggio eseguendo uno dei seguenti comandi prima di eseguire il prossimo pull: git config pull.rebase false # merge (strategia predefinita) git config pull.rebase true # rebase git config pull.ff only # esegui solo fast forward Puoi sostituire "git config" con "git config --global" per impostare una preferenza predefinita per tutti i repository. Puoi anche passare gli argomenti --rebase, --no-rebase o --ff-only sulla riga di comando per eseguire l'override del valore predefinito configurato per una singola invocazione. remote: Enumerating objects: 28, done. remote: Counting objects: 100% (28/28), done. remote: Compressing objects: 100% (12/12), done. remote: Total 28 (delta 12), reused 22 (delta 12), pack-reused 0 Decompressione degli oggetti in corso: 100% (28/28), 22.64 KiB | 526.00 KiB/s, fatto. Da https://github.com/akalin/gopar 93c63e8..e638488 master -> origin/master * [nuovo branch] issue5-debug -> origin/issue5-debug Aggiornamento di 93c63e8..e638488 Fast-forward cmd/par/main.go | 59 ++++++++++++++++++++++++++++++++++++++--------------------- par1/decoder.go | 31 ++++++++++++------------------- par1/decoder_test.go | 24 ++++++++++++------------ par1/encoder_test.go | 4 ++-- par2/decoder.go | 57 ++++++++++++++++++++++----------------------------------- par2/decoder_test.go | 24 ++++++++++++------------ par2/encoder_test.go | 4 ++-- rsec16/coder.go | 46 ++++++++++++++++++++++++++++++++++++++++------ rsec16/coder_test.go | 28 +++++++++++++++++++++------- 9 files changed, 161 insertions(+), 116 deletions(-) andrea@marcopolo:~/go/src/github.com/akalin/gopar$ git checkout issue5-debug Branch 'issue5-debug' impostato per tracciare il branch remoto 'issue5-debug' da 'origin'. Si è passati a un nuovo branch 'issue5-debug' andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ go run github.com/akalin/gopar/cmd/par r Full-0005.par2 Loaded file description packet for "Full-0005" (ID=0be49d13888f6c69ea09b2307d58f0dd, 19327343417 bytes) Loaded checksums for file with ID 0be49d13888f6c69ea09b2307d58f0dd Loaded main packet: slice byte count=9663672, recovery set size=1, non-recovery set size=0 Loaded creator packet with client ID "Created by par2cmdline version 0.7.4." Hash mismatch for "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd): 16k hash matches (868ce610521d5d69d1d26c664cc5d366) but expected hash (dc31cf125b1fd387dcb49bd86444bd84) doesn't match actual hash (a1ceb53101e86333edc69c8e057f5f9e) [1/1] Loaded data file "Full-0005" (19327343417 bytes, 1999 hits, 9663672 misses) Corrupt data chunk: "Full-0005" (ID 0be49d13888f6c69ea09b2307d58f0dd), bytes 18148376016 to 18158039687 Loaded recovery packet: exponent=3, byte count=9663672 Loaded recovery packet: exponent=4, byte count=9663672 Loaded recovery packet: exponent=5, byte count=9663672 Loaded recovery packet: exponent=6, byte count=9663672 [1] Loaded volume file "Full-0005.vol003+004.par2" DEBUG: hash mismatch error for file Full-0005, 16k hash matches (868ce610521d5d69d1d26c664cc5d366) but expected hash (dc31cf125b1fd387dcb49bd86444bd84) doesn't match actual hash (27b03b2874e1456da3745bc38c7bf772)[1/1] Wrote data file "Full-0005.recovered" (19327343417 bytes) Repaired files: [Full-0005] andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ md5sum Full-0005 a1ceb53101e86333edc69c8e057f5f9e Full-0005 andrea@marcopolo:/mnt/casa/bareos-recover/gopar$ md5sum Full-0005.recovered 27b03b2874e1456da3745bc38c7bf772 Full-0005.recovered
Hi akalin, I'm Mike Nahas, the designer of the PAR2 spec.
So, apallazi, I think what akalin was saying is: Is it possible that files were changing while generating the original par2 files? If they were, different versions of the input files might have been read at different times. The file data is used (1) to generate md5sums for each file and (2) to generate the recovery data. If those reads were different, the recovery data may be generating a file that doesn't match the md5sum for it.
BTW, if the data was compressed, most compression/archiving programs (ZIP, TAR, etc.) can still regenerate most of the files in them. So, hopefully, you haven't lost all your data. (I know that is small comfort after you used PAR2 just to avoid that issue. Sorry.)
Mike
@apalazzi that looks good. But yeah, unless you have any other ideas for things to try, the current theory of Full-0005 being modified during par2 generation seems most likely. Out of curiosity, what format is Full-0005? I wonder if it has some sort of integrity checking itself...
@mdnahas one weirdness is that gopar writes out a new file with a different md5, whereas it sounds like par2cmdline writes out a repaired file that's the same as the original one. Is that expected? Does par2cmdline do something different if it detects that the recovered file doesn't match the hash?
@mdnahas Don't worry, the data should be safe (but see below), I was just double-checking if everything was ok with the backup.
@akalin the file is a bareos backup file, it has some integrity check but not date recovery capabilities.
I've recently discovered that one of the ram modules was bad; even though the backup dates back to 2017, if the ram was already bad at the time that would explain the issue. If this is the case I guess it's impossible to recover the corrupted files... or is there a way to still recover the data?