duperemove-master hangup with a reproducer
trofi opened this issue · 3 comments
trofi commented
I think I have a reproducer script of a hanging duperemove
. I initially wanted to use it to measure scalability bottlenect of duperemove
, but looks like I got it to get stuck:
#!/usr/bin/env bash
rm -fv /tmp/h1K.db /tmp/h1M.db
# create a directory suitable for deduping:
# it contains 1M files of size 1024 bytes.
if [[ ! -d dd ]]; then
echo "Creating directory structure, will take a minute"
mkdir dd
for d in `seq 1 1000`; do
mkdir -v dd/$d
for f in `seq 1 1000`; do
printf "%*s" 1024 "$f" > dd/$d/$f
done
done
sync
fi
echo "duperemove defaults, batch of size 1M"
time { ./duperemove -q --batchsize=1000000 -rd --hashfile=/tmp/h1M.db dd/ >/dev/null 2>&1; }
echo "duperemove defaults, batch of size 1024"
time { ./duperemove -q -rd --hashfile=/tmp/h1K.db dd/ >/dev/null 2>&1; }
$ time ./bench.bash
duperemove defaults, batch of size 1M
^C^X
real 164m12,365s
user 154m14,324s
sys 1m42,050s
Note: there is no progress over two hours. I think it should succeed in minutes (or tens of minutes worst). I ran it on compressed btrfs
.
JackSlateur commented
Hello,
This function does not scale well
Your code took ~4H on my PC
Using a large batchsize is not a good idea: with the defaults, it runs in 28min