reverbrain/eblob

Getting -9 error in random places on Alt Linux

agend opened this issue · 13 comments

agend commented

In random moment in random places getting -9 error.

eblob_fill_write_control_from_ram: ERROR-pread-index: position: 952889874, offset: 0, size: 0, flags: 0x0 [], total data size: 0, disk-size: 0, data_fd: 736, index_fd: 737, bc"...)

or

blob: eblob_mark_entry_removed: d1d917000000: eblob_mark_index_removed: FAILED: index, fd: -1, err: -9

What is the version of eblob?
Can you show more logs?

@shaitan @abudnik please take a look

Does it happens during datasort?

agend commented

No error logs before this one.
I have call stack for this error:

#1  0x00007ffff6bc1f26 in eblob_log_raw (l=0x66ff70, level=1, format=<value optimized out>) at /usr/src/debug/eblob-0.23.8/library/log.c:83
#2  0x00007ffff6babbe4 in eblob_dump_wc_raw (b=0x6700c0, key=0x7ffff2d91cb0, wc=<value optimized out>, str=0x7ffff6bd8690 "eblob_fill_write_control_from_ram: ERROR-pread-index", err=-9)
    at /usr/src/debug/eblob-0.23.8/library/blob.c:962
#3  eblob_dump_wc (b=0x6700c0, key=0x7ffff2d91cb0, wc=<value optimized out>, str=0x7ffff6bd8690 "eblob_fill_write_control_from_ram: ERROR-pread-index", err=-9) at /usr/src/debug/eblob-0.23.8/library/blob.c:981
#4  0x00007ffff6bad9d8 in eblob_fill_write_control_from_ram (b=0x6700c0, key=0x7ffff2d91cb0, wc=0x7ffff2d91950, for_write=0, old=0x0) at /usr/src/debug/eblob-0.23.8/library/blob.c:1482
#5  0x00007ffff6badda3 in _eblob_read_ll (b=0x6700c0, key=0x7ffff2d91cb0, csum=EBLOB_READ_CSUM, wc=0x7ffff2d91950) at /usr/src/debug/eblob-0.23.8/library/blob.c:2493
#6  0x00007ffff6bae1ec in eblob_read_ll (b=<value optimized out>, key=<value optimized out>, fd=0x7ffff2d91a1c, offset=0x7ffff2d91a10, size=<value optimized out>, csum=<value optimized out>)
    at /usr/src/debug/eblob-0.23.8/library/blob.c:2548
#7  0x00007ffff6bae2cd in eblob_read_data_ll (b=0x6700c0, key=0x7ffff2d91cb0, offset=0, dst=0x7ffff2d91d58, size=0x7ffff2d91d60, csum=EBLOB_READ_CSUM) at /usr/src/debug/eblob-0.23.8/library/blob.c:2601
#8  0x00007ffff7bdbed3 in queue_pull () from /usr/lib/libsolid_queue.so
#9  0x000000000040ecd5 in get_packet_from_queue (element=0x7ffff2d91dc0, worker=0x66acb0) at /home/user/dupd-1.4.5/transmitter/src/transmitter.c:318
#10 0x000000000040e7c5 in transmitter_job (arg=0x66acb0) at /home/user/dupd-1.4.5/transmitter/src/transmitter.c:225
#11 0x00007ffff71529aa in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff6eb414d in clone () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()
agend commented

Eblob version eblob-0.23.8

Does it happens during datasort? [2]

agend commented

How I can check it?

On 10 нояб. 2015 г., at 14:05, Andrey Budnik notifications@github.com wrote:

Does it happens during datasort? [2]


Reply to this email directly or view it on GitHub.

grep "defrag:"
"defrag: sorting" - defrag started (written, if log-level is NOTICE)
"blob: defrag: datasort: success" - defrag stopped (written, if log-level is INFO)

I suspect that -9 error on remove operation occurs during datasort.

agend commented

Logs right just before -9

blob: eblob_index_blocks_fill: index: bloom filter size: 57600
blob: datasort_swap_memory: defrag: datasort_swap_memory: finished
blob: datasort_swap_disk: defrag: data swap start: data: /srv/queue/#####/data-0.6.datasort.
R1TyU7/chunk.L9jNXD -> /srv/queue/#####/data-0.6
blob: datasort_swap_disk: defrag: swapped: data: /srv/queue/#####/data-0.6.datasort.R1TyU7/chunk.L9jNXD -> /srv/queue/#####/data-0.6, data_fd: 539 -> 154, index_fd: 818 -> 605
blob: defrag: datasort: success
blob: 2ac805000000: i6: eblob_fill_write_control_from_ram: ERROR-pread-index: position: 943074477, offset: 0, size: 0, flags: 0x0 [], total data size: 0, disk-size: 0, data_fd: 154, index_fd: 605, bctl: 0x7eff10ad0fb0: -9
blob: eblob_defrag: defrag: completed: 0
blob: datasort_next_defrag: defrag: timed_defrag is: +12 seconds
blob: datasort_next_defrag: defrag: next datasort is sheduled to +60 seconds.
blob: 2ac805000000: _eblob_read_ll: eblob_fill_write_control_from_ram: -9.
agend commented

@abudnik Is there a fix for our case?

Sorry, it is not fixed yet.

agend commented

@abudnik Is there any work around?

Hi @agend , can you try changes from #164 and check that the problem is gone?

agend commented

All fixed now. Thank you @shaitan.