cmusatyalab/coda

segmentation fault at "0x0000556d2b84ccb0 in Recov_GetStatistics () at venusrecov.cc:561"

krichter722 opened this issue · 4 comments

After sudo rm -r /var/lib/coda/* && sudo touch /var/lib/coda/LOG /var/lib/coda/DATA && sudo apt-get install --reinstall coda-client causes systemd unit coda-client to fail with

Dez 01 22:09:49 richter-Lenovo-IdeaPad-Z500 venus[10053]: Date: Thu 12/01/2016
Dez 01 22:09:49 richter-Lenovo-IdeaPad-Z500 venus[10053]: 22:09:49 Coda Venus, version 6.10.0
Dez 01 22:09:49 richter-Lenovo-IdeaPad-Z500 venus[10053]: 22:09:49 /var/lib/coda/LOG size is 0 bytes
Dez 01 22:11:19 richter-Lenovo-IdeaPad-Z500 systemd[1]: coda-client.service: Start operation timed out. Terminating.
Dez 01 22:11:19 richter-Lenovo-IdeaPad-Z500 venus[10053]: 22:11:19 fatal error -- Recov_GetStatistics: rvm_statistics failed (200)
Dez 01 22:11:19 richter-Lenovo-IdeaPad-Z500 venus[10053]: 22:11:19 Fatal Signal (11); pid 10110 becoming a zombie...
Dez 01 22:11:19 richter-Lenovo-IdeaPad-Z500 venus[10053]: 22:11:19 You may use gdb to attach to 10110
Dez 01 22:11:19 richter-Lenovo-IdeaPad-Z500 systemd[1]: Failed to start Coda Cache Manager.
Dez 01 22:11:19 richter-Lenovo-IdeaPad-Z500 systemd[1]: coda-client.service: Unit entered failed state.
Dez 01 22:11:19 richter-Lenovo-IdeaPad-Z500 systemd[1]: coda-client.service: Failed with result 'timeout'.

The debugger backtrace after attaching gdb with gdb attach [pid] is

(gdb) bt
#0  0x00007fcc5ccbfb96 in __GI___sigsuspend (set=set@entry=0x7ffd9ba06410)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:30
#1  0x0000556d2b878e70 in SigChoke (sig=11) at sighand.cc:246
#2  <signal handler called>
#3  0x0000556d2b85024f in VenusPrint (fd=5, argc=argc@entry=1, 
    argv=argv@entry=0x7ffd9ba06af0) at venusutil.cc:269
#4  0x0000556d2b850449 in VenusPrint (fp=<optimized out>, argc=argc@entry=1, 
    argv=argv@entry=0x7ffd9ba06af0) at venusutil.cc:221
#5  0x0000556d2b850493 in DumpState () at venusutil.cc:475
#6  0x0000556d2b85069f in DumpState () at venusutil.cc:471
#7  choke (file=file@entry=0x556d2b89e1a1 "venusrecov.cc", 
    line=line@entry=561, 
    fmt=fmt@entry=0x556d2b89e3b8 "Recov_GetStatistics: rvm_statistics failed (%d)") at venusutil.cc:193
#8  0x0000556d2b84ccb0 in Recov_GetStatistics () at venusrecov.cc:561
#9  0x0000556d2b84cede in Recov_GetStatistics () at venusrecov.cc:566
#10 RecovFlush (Force=Force@entry=1) at venusrecov.cc:568
#11 0x0000556d2b878d9e in SigExit (sig=<optimized out>) at sighand.cc:258
#12 <signal handler called>
#13 0x00007fcc5cd81c70 in __read_nocancel ()
    at ../sysdeps/unix/syscall-template.S:84
#14 0x00007fcc5df620d2 in read (__nbytes=2560, __buf=0x7ffd9ba07360, 
    __fd=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
---Type <return> to continue, or q <return> to quit---
#15 read_dev (dev=dev@entry=0x556d2bdd98a8, offset=<optimized out>, dest=dest@entry=0x7ffd9ba07360 "", length=length@entry=2560) at rvm_io.c:267
#16 0x00007fcc5df54eb7 in read_log_status (log=log@entry=0x556d2bdd9870, status_buf=status_buf@entry=0x0) at rvm_logstatus.c:690
#17 0x00007fcc5df5519f in open_log (dev_name=dev_name@entry=0x556d2bdd2930 "/var/lib/coda/LOG", log_ptr=log_ptr@entry=0x7ffd9ba07e60, status_buf=status_buf@entry=0x0, 
    rvm_options=rvm_options@entry=0x556d2bacdca0 <Recov_Options>) at rvm_logstatus.c:378
#18 0x00007fcc5df55394 in do_log_options (log_ptr=log_ptr@entry=0x7ffd9ba07ea0, rvm_options=rvm_options@entry=0x556d2bacdca0 <Recov_Options>) at rvm_logstatus.c:445
#19 0x00007fcc5df62cfe in do_rvm_options (rvm_options=rvm_options@entry=0x556d2bacdca0 <Recov_Options>) at rvm_status.c:127
#20 0x00007fcc5df514d3 in rvm_initialize (rvm_version=<optimized out>, rvm_version@entry=0x556d2b89e880 "RVM Interface Version 1.3  7 Mar 1994", 
    rvm_options=rvm_options@entry=0x556d2bacdca0 <Recov_Options>) at rvm_init.c:61
#21 0x0000556d2b84d8c0 in Recov_InitRVM () at venusrecov.cc:412
#22 RecovInit () at venusrecov.cc:241
#23 0x0000556d2b81c53b in main (argc=<optimized out>, argv=<optimized out>) at venus.cc:195

experienced with 6.10.0-1+ubuntu16.10 on Ubuntu 16.10

Why did you create empty files for LOG and DATA?

Why did you create empty files for LOG and DATA?

The removal was intended as a cleanup action and venus refuses to start if those file don't exist. I figured venus was capable to deal with empty files (would be nice if it did).

This log entry is probably related.

/var/lib/coda/LOG size is 0 bytes
Try starting it as venus -init once.

After venus -init I get repeatedly a (similar)

Dez 01 22:47:17 richter-Lenovo-IdeaPad-Z500 venus[21529]: 22:47:17 Loading RVM data
Dez 01 22:47:17 richter-Lenovo-IdeaPad-Z500 venus[21529]: 22:47:17 fatal error -- Recov_LoadRDS: heap mismatch (50000000, d0488000) vs (50000000, d0488000)

in systemctl output and

22:47:17 Coda Venus, version 6.10.0
22:47:17 /var/lib/coda/LOG size is 3021091840 bytes
22:47:17 /var/lib/coda/DATA size is 12084357960 bytes
22:47:17 Loading RVM data
22:47:17 fatal error -- Recov_LoadRDS: heap mismatch (50000000, d0488000) vs (50000000, d0488000)
Assertion failed: 0, file "venusrecov.cc", line 506
***BackTrace***
/usr/sbin/venus(coda_assert+0x76)[0x5556d2ce7b76]
/usr/sbin/venus(_Z5chokePKciS0_z+0xc8)[0x5556d2ca65a8]
/usr/sbin/venus(_Z9RecovInitv+0x319)[0x5556d2ca3a09]
/usr/sbin/venus(main+0x2fb)[0x5556d2c7253b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f7ef57b93f1]
/usr/sbin/venus(_start+0x2a)[0x5556d2c7488a]
Sleeping forever.  You may use gdb to attach to process 21547.

in /var/log/coda/venus.err, but I cannot attach gdb in this case because the mentioned process doesn't exist.

/var/log/coda/venus.log reads

[ X(00) : 0000 : 22:56:42 ] Coda Venus, version 6.10.0
[ X(00) : 0000 : 22:56:42 ] Logfile initialized with LogLevel = 0 at Thu Dec  1 22:56:42 2016

[ X(00) : 0000 : 22:56:42 ] E StatsInit()
[ X(00) : 0000 : 22:56:42 ] L StatsInit()
*****  VenusPrint  *****

*** BEGIN RealmDB ***
00000003 realm 'NOT_REALLY_CODA.dpkg-tmp', refcount 0/0
00000002 realm 'NOT_REALLY_CODA', refcount 0/0
00000001 realm 'localhost', refcount 0/2
*** END RealmDB ***
Unix Rusage:
    times = (0, 0), rss = (3136, 0, 0, 0)
    page = (160, 3), swap = (0), block = (648, 160)
    msg = (0, 0), sig = (0), csw = (21, 0)

Vprocs: tbl = 0xc6a4f2f0, counter = 1, nprocs = 1
0xc7ca93c0 : Main             : id = (c7ca9b30 : 0), stack = (0 : 0), seq = 0, flags = (00)

VFS Operations
 Operation                 Counts                    Times
Root          :      0  [    0     0     0]  :    0.0 (  0.0)
OpenByFD      :      0  [    0     0     0]  :    0.0 (  0.0)
Open          :      0  [    0     0     0]  :    0.0 (  0.0)
Close         :      0  [    0     0     0]  :    0.0 (  0.0)
Ioctl         :      0  [    0     0     0]  :    0.0 (  0.0)
Getattr       :      0  [    0     0     0]  :    0.0 (  0.0)
Setattr       :      0  [    0     0     0]  :    0.0 (  0.0)
Access        :      0  [    0     0     0]  :    0.0 (  0.0)
Lookup        :      0  [    0     0     0]  :    0.0 (  0.0)
Create        :      0  [    0     0     0]  :    0.0 (  0.0)
Remove        :      0  [    0     0     0]  :    0.0 (  0.0)
Link          :      0  [    0     0     0]  :    0.0 (  0.0)
Rename        :      0  [    0     0     0]  :    0.0 (  0.0)
Mkdir         :      0  [    0     0     0]  :    0.0 (  0.0)
Rmdir         :      0  [    0     0     0]  :    0.0 (  0.0)
Symlink       :      0  [    0     0     0]  :    0.0 (  0.0)
Readlink      :      0  [    0     0     0]  :    0.0 (  0.0)
Fsync         :      0  [    0     0     0]  :    0.0 (  0.0)
Vget          :      0  [    0     0     0]  :    0.0 (  0.0)
Signal        :      0  [    0     0     0]  :    0.0 (  0.0)
Replace       :      0  [    0     0     0]  :    0.0 (  0.0)
Flush         :      0  [    0     0     0]  :    0.0 (  0.0)
PurgeUser     :      0  [    0     0     0]  :    0.0 (  0.0)
ZapFile       :      0  [    0     0     0]  :    0.0 (  0.0)
ZapDir        :      0  [    0     0     0]  :    0.0 (  0.0)
PurgeFid      :      0  [    0     0     0]  :    0.0 (  0.0)
OpenByPath    :      0  [    0     0     0]  :    0.0 (  0.0)
Resolve       :      0  [    0     0     0]  :    0.0 (  0.0)
Reintegrate   :      0  [    0     0     0]  :    0.0 (  0.0)
Statfs        :      0  [    0     0     0]  :    0.0 (  0.0)
Store         :      0  [    0     0     0]  :    0.0 (  0.0)
Release       :      0  [    0     0     0]  :    0.0 (  0.0)

RPC Operations:
 Operation    	Good  Bad   Time MGood  MBad MTime   RPCR MRPCR
GetAttr              0     0   0.0     0     0   0.0     0     0
GetACL               0     0   0.0     0     0   0.0     0     0
Fetch                0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
SetACL               0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
GetRootVolume        0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
GetVolumeStatus      0     0   0.0     0     0   0.0     0     0
SetVolumeStatus      0     0   0.0     0     0   0.0     0     0
DisconnectFS         0     0   0.0     0     0   0.0     0     0
GetTime              0     0   0.0     0     0   0.0     0     0
nExpired             0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
GetStatistics        0     0   0.0     0     0   0.0     0     0
GetVolumeInfo        0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
GetVolumeLocation     0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
COP2                 0     0   0.0     0     0   0.0     0     0
Resolve              0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
Repair               0     0   0.0     0     0   0.0     0     0
SetVV                0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
AllocFids            0     0   0.0     0     0   0.0     0     0
iceAllocFids         0     0   0.0     0     0   0.0     0     0
ValidateAttrs        0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
NewConnectFS         0     0   0.0     0     0   0.0     0     0
GetVolVS             0     0   0.0     0     0   0.0     0     0
ValidateVols         0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
OpenReintHandle      0     0   0.0     0     0   0.0     0     0
QueryReintHandle     0     0   0.0     0     0   0.0     0     0
SendReintFragment     0     0   0.0     0     0   0.0     0     0
CloseReintHandle     0     0   0.0     0     0   0.0     0     0
Reintegrate          0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
y                    0     0   0.0     0     0   0.0     0     0
GetAttrPlusSHA       0     0   0.0     0     0   0.0     0     0
ValidateAttrsPlusSHA     0     0   0.0     0     0   0.0     0     0

RPC Packets:
RPC2:
   Sent:           Total        Retrys  Busies   Naks
      Uni:        0 : 0             0       0       0
      Multi:      0 : 0             0       0       0
   Received:       Total          Replys       Reqs       Busies    Bogus    Naks
      Uni:        0 : 0             0 : 0       0 : 0       0 : 0       0       0
      Multi:      0 : 0             0 : 0       0 : 0       0 : 0       0       0
SFTP:
   Sent:           Total        Starts     Datas       Acks    Naks   Busies
      Uni:        0 : 0             0       0 : 0         0       0       0
      Multi:      0 : 0             0       0 : 0         0       0       0
   Received:       Total        Starts     Datas       Acks    Naks   Busies
      Uni:        0 : 0             0       0 : 0         0       0       0
      Multi:      0 : 0             0       0 : 0         0       0       0

connent: 0, 0, 0
srvent: 0, 0, 0
mgrpent: 0, 0, 0
vsgent: 0, 0, 0
volrep: 0, 0, 0
repvol: 0, 0, 0
binding: 0, 0, 0
namectxt: 0, 0, 0
resent: 0, 0, 0
cop2ent: 0, 0, 0
msgent: 0, 0, 0
************************

[ X(00) : 0000 : 22:56:42 ] TERM: About to terminate venus

There's an unanswered mailing list post from 2004 about this.

Actually the heap mismatch in that mailing list post is (0, 0) vs (50000000, 59fe000) which I think is probably related to RVM data being a zero filled file instead of initialized correctly for RDS.

In your case it is a more curious (50000000, d0488000) vs (50000000, d0488000) which seems to indicate that RDS initialization completed and I'm not sure why it claims there is a mismatch unless some other version check is failing.