google/stenographer

cleanup deletes too many files

xoh opened this issue · 4 comments

xoh commented

When the disk becomes full (as in free space is below DiskFreePercentage), too many files get deleted. The issue is tedious to reproduce, as the disk has to fill up. Irregularly monitoring used disk space, I've noticed it drops from 95% to below 80%, sometimes even lower. As far as I understand, only the last packet-file should be deleted (filesizes on my setup range from 20MB to 350MB), to provide the largest possible capture-timeframe.

I tried to trigger debug outputs like

v(0, "Thread %v disk usage is high (packet path=%q): %d%% free <= %d%% threshold", t.id, t.packetPath, df, t.conf.DiskFreePercentage)
or
v(2, "pruneOldestThreadFiles - files count %v, t.files count %v", len(files), len(t.files))
but nothing appears in syslog/journalctl. Setting higher verbosity either failes for me:

Setting "Flags": ["-v=10"] in /etc/stenographer/config triggers /usr/bin/stenotype: invalid option -- '=' on start of stenographer

or results in way more log messages, but none maching those of cleanUpOnLowDiskSpace() or pruneOldestThreadFiles() in stenographer/thread/thread.go:

Setting "Flags": ["-vvvvvvvvvv"] in /etc/stenographer/config leads to tons of logs, mostly "BlockDone", "New block" and "BlockReset"

Almost all logs are from .cc-Files, a few from index.h, some log stenoread requests. Not a single logline of journalctl and /var/log/syslog corresponds to the log-calls in stenographer/thread/thread.go.

I've now setup a dirty monitoring to show the issue:

Timestamp                      Filesystem      Size  Used Avail Use% Mounted on
Wed May 23 11:27:31 CEST 2018: /dev/sdb1        50G   44G  3,0G  94% /mnt/steno_drive
Wed May 23 11:27:46 CEST 2018: /dev/sdb1        50G   43G  3,6G  93% /mnt/steno_drive

Wed May 23 11:32:31 CEST 2018: /dev/sdb1        50G   44G  3,0G  94% /mnt/steno_drive
Wed May 23 11:32:46 CEST 2018: /dev/sdb1        50G   44G  3,5G  93% /mnt/steno_drive

Wed May 23 11:35:31 CEST 2018: /dev/sdb1        50G   44G  3,0G  94% /mnt/steno_drive
Wed May 23 11:35:46 CEST 2018: /dev/sdb1        50G   36G   11G  78% /mnt/steno_drive

Wed May 23 11:36:01 CEST 2018: /dev/sdb1        50G   36G   11G  78% /mnt/steno_drive
Wed May 23 11:36:16 CEST 2018: /dev/sdb1        50G   36G   11G  77% /mnt/steno_drive

Wed May 23 12:28:46 CEST 2018: /dev/sdb1        50G   44G  3,0G  94% /mnt/steno_drive
Wed May 23 12:29:01 CEST 2018: /dev/sdb1        50G   43G  3,8G  92% /mnt/steno_drive

Wed May 23 12:37:01 CEST 2018: /dev/sdb1        50G   44G  3,0G  94% /mnt/steno_drive
Wed May 23 12:37:16 CEST 2018: /dev/sdb1        50G   23G   24G  50% /mnt/steno_drive

Wed May 23 14:28:31 CEST 2018: /dev/sdb1        50G   44G  3,0G  94% /mnt/steno_drive
Wed May 23 14:28:46 CEST 2018: /dev/sdb1        50G   30G   18G  64% /mnt/steno_drive

The third, sixth and seventh cleanup delete too many files.

Out of curiosity, was 6f29347 compiled in when you saw this behavior? It was designed to fix this exact issue.

xoh commented

Actually, no. I compiled the latest version. Getting back after some cleanups have passed.

xoh commented

This actually was the issue. No cleanups larger than 2% any more. Thanks!

Awesome, thanks for checking back in and closing things out! And thanks to fming188 for pointing out the root cause in #187 :)