yannh/redis-dump-go

Timeouts on large ZSETs

linydquantil opened this issue · 24 comments

redis version: 5.0.4
Sometimes it(redis-dump-go) hangs and waits for a long time.

yannh commented

hi @linydquantil , is it possible you re having connectivity issues / wifi? I m not aware of these kind of issues so far.

redis server is on localhost.
for example:

Keyspace

db0:keys=77,expires=0,avg_ttl=0
db1:keys=81636,expires=0,avg_ttl=0

#./redis-dump-go -db 1 > dump.resp.1
Database 1: 52800 element dumpedError: read tcp 127.0.0.1:60858->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:60858->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:60858->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:60858->127.0.0.1:6379: i/o timeout
Database 1: 72800 element dumped
Database 1: 81636 element dumped

and then it hangs
So is it still dumping data?

yannh commented

The tool opens 10 connections by default, it seems like some of these time out.. Can you try with -n 1 or -n 2 ? If there is any public dataset I could try to reproduce this on it would be quite useful.

I found that when the key type is zset, and the value of the key is too large, redis-dump-go will be hanged

When the key type is zset, and its value has 500w pieces of data, its output will be very slow

yannh commented

Hi @linydquantil , 500w? 500 items?

Yes,
the func is
image

yannh commented

Right - but is that Redis command itself slow? It seems i can increase timeouts so it doesnt timeout, but not sure why it would be so slow.

If only get data from redis by using zrange of redis command, it takes about 1 min.

But, when we use "radix.Cmd(&val,"ZRANGE",key...)", it will become very slow.

I think it may be a slow step to format the data obtained from redis into val.
Whether this step can be optimized.
Thanks

if you have any idea, please let me know

I noticed the same problem with redis-dump-go seemingly hanging when dumping a large ZSET.
I have a redis database with just one key, type ZSET with 1259853 entries, size ~170M.

Accessing for example the first 10000 entries with redis-cli is near instantaneous, but redis-dump-go just seems to hang and does not output anything. Using perf top I see that it's doing something but nothing comes out. I've waited for more than an hour until hitting ctrl+c. Dump is taken inside the container where redis is running.

[root@5fdd0bc6fb04 tmp]# ./redis-dump-go > out
Database 0: 1 element dumped

perf top:

  31.37%  redis-dump-go       [.] runtime.memmove
  23.29%  redis-dump-go       [.] runtime.greyobject
  20.93%  redis-dump-go       [.] runtime.scanobject
  16.19%  redis-dump-go       [.] runtime.findObject
   0.42%  redis-dump-go       [.] runtime.memclrNoHeapPointers
yannh commented

Hi @mpartio , interesting, I have not tested performance with single entries this large. I can't promise when I will have time to investigate, but with the information you provided I might be able to reproduce. Thanks!

I tracked down the issue to string concatenation, namely this loop here:

for _, arg := range cmd {

Maybe using a string builder would be better in case very large strings are created?

yannh commented

Yep - I also think this is the problem. On it.

yannh commented

Just pushed v0.4.5 . Now I'll go cry myself to sleep 🤣

image

Please let me know if it helps.. It should be a nice little performance improvement for most use cases. Thanks for the pointer 👍

Hi, @yannh @mpartio well done, I just test the latest redis-dump-go.

# Keyspace
db0:keys=293,expires=17,avg_ttl=0
db1:keys=7515,expires=0,avg_ttl=0                 my large key is in db1
db2:keys=3996,expires=0,avg_ttl=0
db3:keys=2198,expires=0,avg_ttl=0
db4:keys=500002,expires=0,avg_ttl=0
db5:keys=628,expires=251,avg_ttl=0
db12:keys=62,expires=0,avg_ttl=0

the command result is

[root@bos-test-redis tmp]# ./redis-dump-go -db 0 > dump.resp
Database 0: 293 element dumped
[root@bos-test-redis tmp]# ./redis-dump-go -db 1 > dump.resp 
Database 1: 3700 element dumpedError: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Database 1: 3800 element dumpedError: read tcp 127.0.0.1:59434->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59434->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59438->127.0.0.1:6379: i/o timeout
Database 1: 5700 element dumpedError: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Database 1: 5800 element dumpedError: read tcp 127.0.0.1:32884->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:32884->127.0.0.1:6379: i/o timeout
Database 1: 7515 element dumped
[root@bos-test-redis tmp]# ./redis-dump-go -db 2 > dump.resp 
Database 2: 3996 element dumped
[root@bos-test-redis tmp]# ./redis-dump-go -db 3 > dump.resp 
Database 3: 2198 element dumped
[root@bos-test-redis tmp]# ./redis-dump-go -db 4 > dump.resp 
Database 4: 500158 element dumped

it will cause some errors

yannh commented

@linydquantil that seems to be a timeout in the lower level Redis library the tool is using (Radix). I've released v0.4.6 with a 5m timeout. All those are pretty difficult to tests without integration tests, which I haven't found the time to write just yet. Your continuous support is much appreciated.

@yannh
i use v0.4.6 to do some tests. this tool is good for me

[root@bos-test-redis tmp]# ./redis-dump-go -db 1 > dump.resp
Database 1: 7515 element dumped

Next, I will do a more detailed test.

yannh commented

Well that took only a year ;) Sorry for the long delay, I didnt realise the values might have been this large. Thanks for sticking around! 💪

Thanks @yannh, with the 0.4.5 the dump is near-instantaneous.
I do have problems importing the result to a new redis instance though:

$ redis-cli --pipe < dump.resp
Error writing to the server: Connection reset by peer

This could be something in my environment though, as dump.resp looks like correct redis protocol. I have to dig deeper here to find out what's wrong. Thank you for your great support!

@mpartio you can test v0.4.6

yannh commented

@mpartio the redis-cli might have similar timeouts as redis-dump-go... I'll be closing this ticket for now, since this does look unrelated. Feel free to re-open or create a new ticket should the need arise. Thanks!