namecoin/namecoin-legacy

Crash in AppInit() upon start (EXCEPTION: NSt8ios_base7failureE)

Closed this issue · 9 comments

Hi,

On latest git, one of my namecoin installations is crashing upon start.
I suppose it might be reacting this way to corrupt files.

Any way I can help debugging it and hopefully find a way for namecoin to handle this?

solt@dev2:~/namecoin/src$ ./namecoind


************************
EXCEPTION: NSt8ios_base7failureE
CDataStream::read() : end of data
namecoin in AppInit()

terminate called after throwing an instance of 'std::ios_base::failure'
  what():  CDataStream::read() : end of data
Aborted

This means that one of the unserialisation routines fails. It is probably really related to corrupt data files (or a bug in the format upgrades done recently, but since the upgrade works in general, I don't see why it would fail for you). I don't really see what we could do about corrupt data files.

If you still want to find out more, you can run namecoind in a debugger (e. g., gdb with "catch throw") and see where exactly (backtrace) the exception is thrown.

I ran into this one once when I had renamed/deleted blkindex.dat IIRC. I had to rename my Namecoin directory and redownload the blockchain (once it is finished you can copy the wallet.dat to the new data folder).
Edit: Could be that I also mixed versions to provoke the error. Stick to the new version!

I'll most probably end up doing what phelixbtc suggests.
In the mean time. I recompiled with -ggdb3 and did a backtrace. It's at:

Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1                                                                                        ".
[New Thread 0xb6d5db70 (LWP 8603)]
Catchpoint 1 (exception thrown), 0xb7eff160 in __cxa_throw ()
   from /usr/lib/i386-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0xb7eff160 in __cxa_throw () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#1  0x0806f891 in CDataStream::setstate (
    this=<error reading variable: Unhandled dwarf expression opcode 0xfa>,
    psz=0x83da2ec "CDataStream::read() : end of data", bits=4,
    this=<error reading variable: Unhandled dwarf expression opcode 0xfa>)
    at serialize.h:1028
#2  0x0807a834 in CDataStream::read (this=0xbffff064, pch=0xc390438 "",
    nSize=22) at serialize.h:1056
#3  0x08081bf7 in Unserialize<CDataStream, std::allocator<bool> > (is=...,
    v=..., nType=nType@entry=2, nVersion=nVersion@entry=37500)
    at serialize.h:590
#4  0x08082b5f in SerReadWrite<CDataStream, std::vector<bool> > (
    nVersion=37500, nType=2, obj=..., s=..., ser_action=...) at serialize.h:810
#5  CTxIndex::Unserialize<CDataStream> (this=0xbffff19c, s=..., nType=2,
    nVersion=37500) at main.h:794
#6  0x08082d71 in Unserialize<CDataStream, CTxIndex> (nVersion=37500, nType=2,
    is=..., a=...) at serialize.h:411
#7  operator>><CTxIndex> (obj=..., this=0xbffff064) at serialize.h:1125
#8  CDB::Read<std::pair<std::string, uint256>, CTxIndex> (
    this=this@entry=0xbffff240, key=..., value=...) at db.h:91
#9  0x08078779 in CTxDB::ReadTxIndex (this=0xbffff240, hash=..., txindex=...)
    at db.cpp:424
#10 0x080e2da0 in CWallet::ReacceptWalletTransactions (this=0xc380df0)
    at wallet.cpp:541
#11 0x0815874a in AppInit2 (argc=-1073744776, argv=0xbffff7a4) at init.cpp:467
#12 0x0815a2a8 in AppInit (argc=argc@entry=1, argv=argv@entry=0xbffff7a4)
    at init.cpp:116
#13 0x080536cb in main (argc=1, argv=0xbffff7a4) at init.cpp:102

Going up the trace I found dbset to contain references of blkindex.dat and nameindex.dat.
Can't tell which is having problems.

I know it's bad to suggest root causes, but could it be that a previous blkindex.dat rewrite didn't complete properly? Namecoind isn't running and this is my ~/.namecoin:

I know it's bad to suggest root causes, but could it be that a previous blkindex.dat rewrite didn't complete properly? Namecoind isn't running and this is my ~/.namecoin:
-rw-------  1 user user     630784 Jun 12 16:13 addr.dat
-rw-------  1 user user 1158031186 Jun 12 16:06 blk0001.dat
-rw-------  1 user user  517251072 Jun 17 12:50 blkindex.dat
-rw-------  1 user user  168435712 May 15 15:47 blkindex.dat.rewrite
drwx------  2 user user       4096 Jun 17 11:18 database
-rw-------  1 user user          0 Jun 17 11:21 db.log
-rw-------  1 user user     210693 Jun 17 17:47 debug.log
-rw-------  1 user user          0 Apr 25 13:32 .lock
-rw-r--r--  1 user user         67 Apr 28 09:55 namecoin.conf
-rw-------  1 user user   61640704 Jun 12 16:13 nameindex.dat
-rw-------  1 user user      94208 Jun 17 12:51 wallet.dat

From your backtrace it is clear that blkindex.dat is the problem. Also, as you observed, the rewrite should have finished. However, actually I believe that even if you killed the daemon while it was rewriting, there "shouldn't" be a problem (the database file would just be larger than necessary and contain lots of empty pages). However, I've not actually tested that, of course. So hopefully this doesn't indicate a "real" bug.

I don't know. Anyway, I removed blkindex.dat and restarted it and it's
synched up now and not crashing.
The incident is resolved, but perhaps the general question remains: how
should namecoind handle exceptions on trying to read a corrupted file.
Maybe an error message saying which file was impossible to read and suggest
removal?

On Tue, Jun 17, 2014 at 6:16 PM, Daniel Kraft notifications@github.com
wrote:

From your backtrace it is clear that blkindex.dat is the problem. Also, as
you observed, the rewrite should have finished. However, actually I believe
that even if you killed the daemon while it was rewriting, there
"shouldn't" be a problem (the database file would just be larger than
necessary and contain lots of empty pages). However, I've not actually
tested that, of course. So hopefully this doesn't indicate a "real" bug.


Reply to this email directly or view it on GitHub
#115 (comment).

Pysiak

Note that, as far as my understanding goes, you can't "recover" from a missing/corrupted blkindex.dat in the same way as you can recreate the nameindex. I believe that if you delete blkindex.dat (or have to delete it), then you also should remove the blk*.dat files since the whole blockchain will be downloaded again anyway. (Since I do not yet know fully how the networking code works, this may be wrong - but I believe it is the case. You should be able to see whether or not the blk0001.dat file is double its initial size after you have finished syncing.)

Well, that's what I thought so too and I figured blk.dat files need to go
as well, so I removed them together with blkindex.dat yesterday. I forgot
to mention that. Not sure about the size, I think it's similar size now. I
was low on disk space and I have more now, but I removed the .rewrite.
Perhaps if we know how much disk space is required for the rewrite,
namecoind could check if there's enough?

On Wed, Jun 18, 2014 at 11:15 AM, Daniel Kraft notifications@github.com
wrote:

Note that, as far as my understanding goes, you can't "recover" from a
missing/corrupted blkindex.dat in the same way as you can recreate the
nameindex. I believe that if you delete blkindex.dat (or have to delete
it), then you also should remove the blk*.dat files since the whole
blockchain will be downloaded again anyway. (Since I do not yet know fully
how the networking code works, this may be wrong - but I believe it is the
case. You should be able to see whether or not the blk0001.dat file is
double its initial size after you have finished syncing.)


Reply to this email directly or view it on GitHub
#115 (comment).

Pysiak

Closing as this issue did not bubble up any more.