Timeouts on large ZSETs
linydquantil opened this issue · 24 comments
redis version: 5.0.4
Sometimes it(redis-dump-go) hangs and waits for a long time.
hi @linydquantil , is it possible you re having connectivity issues / wifi? I m not aware of these kind of issues so far.
redis server is on localhost.
for example:
Keyspace
db0:keys=77,expires=0,avg_ttl=0
db1:keys=81636,expires=0,avg_ttl=0
#./redis-dump-go -db 1 > dump.resp.1
Database 1: 52800 element dumpedError: read tcp 127.0.0.1:60858->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:60858->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:60858->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:60858->127.0.0.1:6379: i/o timeout
Database 1: 72800 element dumped
Database 1: 81636 element dumped
and then it hangs
So is it still dumping data?
The tool opens 10 connections by default, it seems like some of these time out.. Can you try with -n 1 or -n 2 ? If there is any public dataset I could try to reproduce this on it would be quite useful.
I found that when the key type is zset, and the value of the key is too large, redis-dump-go will be hanged
When the key type is zset, and its value has 500w pieces of data, its output will be very slow
Hi @linydquantil , 500w? 500 items?
Right - but is that Redis command itself slow? It seems i can increase timeouts so it doesnt timeout, but not sure why it would be so slow.
If only get data from redis by using zrange of redis command, it takes about 1 min.
But, when we use "radix.Cmd(&val,"ZRANGE",key...)", it will become very slow.
I think it may be a slow step to format the data obtained from redis into val.
Whether this step can be optimized.
Thanks
if you have any idea, please let me know
I noticed the same problem with redis-dump-go seemingly hanging when dumping a large ZSET.
I have a redis database with just one key, type ZSET with 1259853 entries, size ~170M.
Accessing for example the first 10000 entries with redis-cli is near instantaneous, but redis-dump-go just seems to hang and does not output anything. Using perf top I see that it's doing something but nothing comes out. I've waited for more than an hour until hitting ctrl+c. Dump is taken inside the container where redis is running.
[root@5fdd0bc6fb04 tmp]# ./redis-dump-go > out
Database 0: 1 element dumped
perf top:
31.37% redis-dump-go [.] runtime.memmove
23.29% redis-dump-go [.] runtime.greyobject
20.93% redis-dump-go [.] runtime.scanobject
16.19% redis-dump-go [.] runtime.findObject
0.42% redis-dump-go [.] runtime.memclrNoHeapPointers
Hi @mpartio , interesting, I have not tested performance with single entries this large. I can't promise when I will have time to investigate, but with the information you provided I might be able to reproduce. Thanks!
I tracked down the issue to string concatenation, namely this loop here:
redis-dump-go/redisdump/redisdump.go
Line 60 in 5723398
Maybe using a string builder would be better in case very large strings are created?
Yep - I also think this is the problem. On it.
Hi, @yannh @mpartio well done, I just test the latest redis-dump-go.
# Keyspace
db0:keys=293,expires=17,avg_ttl=0
db1:keys=7515,expires=0,avg_ttl=0 my large key is in db1
db2:keys=3996,expires=0,avg_ttl=0
db3:keys=2198,expires=0,avg_ttl=0
db4:keys=500002,expires=0,avg_ttl=0
db5:keys=628,expires=251,avg_ttl=0
db12:keys=62,expires=0,avg_ttl=0
the command result is
[root@bos-test-redis tmp]# ./redis-dump-go -db 0 > dump.resp
Database 0: 293 element dumped
[root@bos-test-redis tmp]# ./redis-dump-go -db 1 > dump.resp
Database 1: 3700 element dumpedError: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59440->127.0.0.1:6379: i/o timeout
Database 1: 3800 element dumpedError: read tcp 127.0.0.1:59434->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59434->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59438->127.0.0.1:6379: i/o timeout
Database 1: 5700 element dumpedError: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:59436->127.0.0.1:6379: i/o timeout
Database 1: 5800 element dumpedError: read tcp 127.0.0.1:32884->127.0.0.1:6379: i/o timeout
Error: read tcp 127.0.0.1:32884->127.0.0.1:6379: i/o timeout
Database 1: 7515 element dumped
[root@bos-test-redis tmp]# ./redis-dump-go -db 2 > dump.resp
Database 2: 3996 element dumped
[root@bos-test-redis tmp]# ./redis-dump-go -db 3 > dump.resp
Database 3: 2198 element dumped
[root@bos-test-redis tmp]# ./redis-dump-go -db 4 > dump.resp
Database 4: 500158 element dumped
it will cause some errors
@linydquantil that seems to be a timeout in the lower level Redis library the tool is using (Radix). I've released v0.4.6 with a 5m timeout. All those are pretty difficult to tests without integration tests, which I haven't found the time to write just yet. Your continuous support is much appreciated.
@yannh
i use v0.4.6 to do some tests. this tool is good for me
[root@bos-test-redis tmp]# ./redis-dump-go -db 1 > dump.resp
Database 1: 7515 element dumped
Next, I will do a more detailed test.
Well that took only a year ;) Sorry for the long delay, I didnt realise the values might have been this large. Thanks for sticking around! 💪
Thanks @yannh, with the 0.4.5 the dump is near-instantaneous.
I do have problems importing the result to a new redis instance though:
$ redis-cli --pipe < dump.resp
Error writing to the server: Connection reset by peer
This could be something in my environment though, as dump.resp looks like correct redis protocol. I have to dig deeper here to find out what's wrong. Thank you for your great support!
@mpartio you can test v0.4.6