"fatal error -- Assertion failed: file "fso_cfscalls2.cc", line 268" during file transfer
krichter722 opened this issue · 1 comments
After transferring 7GB of data in approx 100K files venus
crashes due to
00:02:39 fatal error -- Assertion failed: file "fso_cfscalls2.cc", line 268
00:02:39 RecovTerminate: clean shutdown
Assertion failed: 0, file "fso_cfscalls2.cc", line 268
***BackTrace***
/usr/sbin/venus(coda_assert+0x76)[0x56525a2bca66]
/usr/sbin/venus(_Z5chokePKciS0_z+0xc8)[0x56525a27b428]
/usr/sbin/venus(_ZN5fsobj7ReleaseEi+0x164)[0x56525a264764]
/usr/sbin/venus(_ZN5fsobj5CloseEij+0x24)[0x56525a2648a4]
/usr/sbin/venus(_ZN5vproc5closeEP11venus_cnodei+0x18b)[0x56525a29de2b]
/usr/sbin/venus(_ZN6worker4mainEv+0xbdd)[0x56525a24919d]
/usr/sbin/venus(_Z13VprocPreamblePv+0xbe)[0x56525a2990ae]
/usr/lib/coda/liblwp.so.2(+0x5d7c)[0x7f719d39bd7c]
/lib/x86_64-linux-gnu/libc.so.6(+0x357f0)[0x7f719c7587f0]
/lib/x86_64-linux-gnu/libc.so.6(sigsuspend+0x16)[0x7f719c758b26]
[0x7ffc3ed9ad00]
Sleeping forever. You may use gdb to attach to process 8098.
and the backtrace in gdb
is
#0 0x00007f719c7f02d0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x00007f719c7f023a in __sleep (seconds=0, seconds@entry=1) at ../sysdeps/posix/sleep.c:55
#2 0x000056525a2bcaf2 in coda_assert (pred=pred@entry=0x56525a2d3c70 "0", file=file@entry=0x56525a2c3e2d "fso_cfscalls2.cc", line=line@entry=268) at coda_assert.c:66
#3 0x000056525a27b428 in choke (file=file@entry=0x56525a2c3e2d "fso_cfscalls2.cc", line=line@entry=268, fmt=fmt@entry=0x56525a2c0978 "Assertion failed: file \"%s\", line %d\n") at venusutil.cc:208
#4 0x000056525a264764 in fsobj::Release (this=this@entry=0x9b38c810, writep=writep@entry=1) at fso_cfscalls2.cc:268
#5 0x000056525a2648a4 in fsobj::Close (this=0x9b38c810, writep=1, uid=<optimized out>) at fso_cfscalls2.cc:313
#6 0x000056525a29de2b in vproc::close (this=this@entry=0x56525b2dba80, cp=cp@entry=0x15175930, flags=3) at vproc_vfscalls.cc:264
#7 0x000056525a24919d in worker::main (this=0x56525b2dba80) at worker.cc:1205
#8 0x000056525a2990ae in VprocPreamble (arg=arg@entry=0x56525b2dbb08) at vproc.cc:152
#9 0x00007f719d39bd7c in _thread (sig=<optimized out>) at lwp_ucontext.c:91
#10 <signal handler called>
#11 0x00007f719c758b26 in __GI___sigsuspend (set=0x7ffc3ed9abb0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:30
#12 0x00007ffc3ed9ad00 in ?? ()
#13 0x00007ffc3ed9ad00 in ?? ()
#14 0x00007ffc3ed9ad01 in ?? ()
#15 0x00007ffc3ed9adee in ?? ()
#16 0x00007ffc3ed9ad00 in ?? ()
#17 0x00007ffc3ed9adee in ?? ()
#18 0x0000000000000000 in ?? ()
experienced with 6.11.2-1+ubuntu16.10 on Ubuntu 16.10
The assertion should trigger because at that point we are trying to free a file object, but it still has 'open' references and we should therefore not have gotten to this point.
So there is either a reference count leak somewhere, or a race condition. 7GB of data over 100k files isn't all that much, I actually read walked the entire coda.cs.cmu.edu tree last week which has quite a few more than that, but that was a read-only action and I was only checking for conflicts so probably not a whole lot of 'open/close' calls on the files (and only for reading it at all).