RedBeardLab/rediSQL

the lock issue, when streaming data write into a table, run the long-time sql

Closed this issue · 30 comments

#66 (comment)

Does this timeout?

I need to fix the timeout as well.

What I believe happens, is that the query takes more than 10sec, RediSQL timeout, and it replies to you with an empty response. However, eventually the query terminates, and it tried to reply, but the internal state is "corrupted" by the previous timeouted reply, so the thread crash. Only the thread crash so RediSQL keep running and you just see something that stop working.

Can you confirm this with the debug build? Just post the logs if you can reproduce it.

i test this on two scenarios.
a table with 4000k data,execute the command "delete from a where id>2000000 and id<3000000", then my client get the error " read timeout". but the database can work, only "10s" timeout
redis log:

[2019-07-02T07:35:44Z DEBUG redisql_lib::redis] Exec | Query = "delete from a where id>2000000 and id<3000000"
[2019-07-02T07:35:44Z DEBUG redisql_lib::redis] do_execute | created statement
[2019-07-02T07:35:44Z DEBUG redisql_lib::community_statement] Execute | Acquired db lock
[2019-07-02T07:35:44Z DEBUG redisql_lib::community_statement] Execute | Read row modified before
[2019-07-02T07:35:54Z DEBUG redisql_lib::sqlite] FromIterator => DONECursor
[2019-07-02T07:35:54Z DEBUG redisql_lib::community_statement] Execute=> Executed trains of statements
[2019-07-02T07:35:54Z DEBUG redisql_lib::community_statement] Execute=>DONECursor
[2019-07-02T07:35:54Z DEBUG redisql_lib::redis] do_execute | statement executed
[2019-07-02T07:35:54Z DEBUG redisql_lib::redis_type] Free thread safe context
[2019-07-02T07:35:54Z DEBUG redisql_lib::redis] RedisContextSet | Drop
[2019-07-02T07:35:54Z DEBUG redisql_lib::redis] with_contex_set | exit
[2019-07-02T07:35:54Z DEBUG redisql_lib::redis] Exec | DONE, returning result
[2019-07-02T07:35:54Z DEBUG redisql_lib::redis] Loop iteration

another :
a table with 4000k data , when i execute the command "delete from b where id>1000000 and id<2000000", the streaming data always write into the table.
my client get the error " timeout". but the database can not work.

the redis log:
[2019-07-02T08:55:17Z DEBUG redisql_lib::redis] Exec | Query = "delete from b where id>=1000000 and id<2000000"
[2019-07-02T08:55:17Z DEBUG redisql_lib::redis] do_execute | created statement
[2019-07-02T08:55:17Z DEBUG redisql_lib::community_statement] Execute | Acquired db lock
[2019-07-02T08:55:17Z DEBUG redisql_lib::community_statement] Execute | Read row modified before
[2019-07-02T08:55:17Z DEBUG redis_sql::commands] Exec | GotDB
[2019-07-02T08:55:17Z DEBUG redis_sql::commands] Exec | BlockedClient
[2019-07-02T08:55:17Z DEBUG redis_sql::commands] Exec | Create Command
[2019-07-02T08:55:28Z DEBUG redis_sql::commands] Exec | GotDB
[2019-07-02T08:55:28Z DEBUG redis_sql::commands] Exec | BlockedClient
[2019-07-02T08:55:28Z DEBUG redis_sql::commands] Exec | Create Command
[2019-07-02T08:55:28Z DEBUG redisql_lib::sqlite] FromIterator => DONECursor
[2019-07-02T08:55:28Z DEBUG redisql_lib::community_statement] Execute=> Executed trains of statements
[2019-07-02T08:55:28Z DEBUG redisql_lib::community_statement] Execute=>DONECursor
[2019-07-02T08:55:28Z DEBUG redisql_lib::redis] do_execute | statement executed
[2019-07-02T08:55:28Z DEBUG redisql_lib::redis_type] Free thread safe context
[2019-07-02T08:55:28Z DEBUG redisql_lib::redis] RedisContextSet | Drop
[2019-07-02T08:55:28Z DEBUG redisql_lib::redis] with_contex_set | exit
[2019-07-02T08:55:28Z DEBUG redisql_lib::redis] Exec | DONE, returning result
[2019-07-02T08:55:28Z DEBUG redisql_lib::redis] Loop iteration

why? lock compete lead to?

Can you believe that I can't reproduce the issue?

I am able to send the database in timeout, but then, after the timeout, it keep working.

Can you please try to post reproducible examples?

Here one of my attempts:

127.0.0.1:6379> redisql.create_db DB
OK
127.0.0.1:6379> REDISQL.exec DB "create table t(id, data,b,c,d,e);"
1) DONE
2) (integer) 0
127.0.0.1:6379> REDISQL.exec DB "INSERT INTO t WITH RECURSIVE cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<14000000) SELECT  cnt.x, random() % 100000000, randomblob(21), randomblob(31), randomblob(26), random() % 1000 FROM c
nt;"
(nil)
(10.03s)
127.0.0.1:6379> REDISQL.exec DB "select 1;"
1) 1) (integer) 1
127.0.0.1:6379> REDISQL.exec DB "DELETE FROM t where e>200 and e<800"
1) DONE
2) (integer) 4194334
(3.21s)

It is possible that you are just not waiting long enough?

Do you think we should abort the query if it take more than the timeout?

when i execute the delete sql, the streaming data not stop to write. so i have two clients. one write , another query.
i have set the maxWaitMillis in my code.

client a:
GenericObjectPoolConfig genericObjectPoolConfig = new GenericObjectPoolConfig();
genericObjectPoolConfig.setMaxIdle(10);
genericObjectPoolConfig.setMinIdle(10);
genericObjectPoolConfig.setMaxWaitMillis(-1); //"-1" means no limit.
JedisPool jedisPool = new JedisPool(genericObjectPoolConfig, ip, port);
Jedis jedis = jedisPool.getResource();
int id = 1;
while(true){
jedis.executeSQL("test", "insert into a(id,device_id,cert_id) values("+id+",0,0)");
id++;
}

client b:
GenericObjectPoolConfig genericObjectPoolConfig = new GenericObjectPoolConfig();
genericObjectPoolConfig.setMaxIdle(10);
genericObjectPoolConfig.setMinIdle(10);
genericObjectPoolConfig.setMaxWaitMillis(-1);
JedisPool jedisPool = new JedisPool(genericObjectPoolConfig, ip, port);
Jedis jedis = jedisPool.getResource();
List test1 = jedis.executeSQL("test", "delete from a where id>=2000000 and id<3000000");

My problem is also this.
When querying, insert data at the same time, causing redis down.
This problem is serious.
@siscia

Alright, I haven't realized that the were 2 clients.

However, please keep in mind that the design of RediSQL is single-thread.

You have one thread that does the work, so you can't do two thing at the time, if one of those thing is slow, the command just queue up.

Eventually everything get done, but eventually.

Does this fit your observations?

If you need to delete stuff from your database, did you consider using multiple databases? Like one for each day?

My data is updated in real time, I also need to query.
But it should not lead to redis crash! Waiting is ok, don't hope the redis not available

Ok, so the problem is that RediSQL crash.

And it is consistent with the wait, ok!

I will do as always, trying to reproduce the problem and then fix it.

If you have a log of the crash, that would be wonderful and it would speed up my work quite a bit.

Well, I will continue to test, I need to upgrade the version.
It is estimated that the crash log is consistent with his problem log.
I have split a database into multiple databases to achieve a data volume of 40K per database, but this problem still occurs.

Do you have any convenient contact tools?
We can communicate easily.

I follow the gitter (slack like) that is on the readme.

And I have an email: simone@redbeardlab.com

The most helpful thing could be to add an integration test, like here: https://github.com/RedBeardLab/rediSQL/blob/master/test/correctness/test.py

After reading these tests, I found that the fields tested were not particularly large.
The redisql field I am using is above 20, and the query can be complicated.
The join table will be particularly slow at the million level, so I will write it to redisql after the join.
For example: count(distinct user_id) or group by user_id (greater than 1w)
So, causes redisql database to crash, or redis service crashes
Both write and read simultaneously

Indeed the test where there mostly to make sure that a feature works as intended and that we don't introduce regressions.

It is definitely a good idea to introduce more complex tests, with more complex tables.

I will do it as soon as possible, but if you can produce a pull request for this it will really speed up my work. If the PR also causes a crash similar to the one you are seeing, that is even better but is not required.

If you need help in how to run the test, please let me know.

You just need to put redis-server on your PATH and point the variable REDIS_MODULE_PATH to the RediSQL library. Then execute python test.py.
(I use python2 for this at the moment)

Oh, the crash log is the same as his
#69 (comment)

Alright, I haven't realized that the were 2 clients.

However, please keep in mind that the design of RediSQL is single-thread.

You have one thread that does the work, so you can't do two thing at the time, if one of those thing is slow, the command just queue up.

Eventually everything get done, but eventually.

Does this fit your observations?

no , after that the database of the reidsql can not work.

The version of redis is still 5.0.5?

The version of redis is still 5.0.5?

yes

Indeed I see some changes from the 5.0.3 to the 5.0.5

Let me work on it!

As always test cases will tremendously help and speed up my work!

i test the redis 5.0.3 + redisql 1.0.3.
it is ok.
And i will be watching for a while.

Did you guys found the same issue in 5.0.3?

From first tests, it seems like the problem during big join is that RediSQL use too much memory.

In 5.0.3 it start to use the swap and then it gets killed by the OOM reaper. On 5.0.5 it seems that Redis cannot allocate memory and it crash.

Do you guys see something that could be consistent with this hypothesis?

Changed to 5.0.3, this problem still occurs

Cause the database to be unavailable

Can this be an out of memory issue?

Is the database just locked? But the process is running or the process stop?

Guys, do you mind to try these two binaries?

https://github.com/RedBeardLab/rediSQL/files/3439238/rc-3.zip

our linux not support GLIBC_2.18 . need cargo build. do you have source code?

It is this branch over here:

#79

If you are a little patiente I can provide the binaries with a smaller glibc version as well.

https://github.com/RedBeardLab/rediSQL/files/3439238/rc-3.zip, After use, the memory grows rapidly, and it grows to 2 times in 1 hour. It has been growing rapidly and has been cut back to the original version. @siscia

Let me prepare the latest RC for you guys to test.

Can you guys share your AOF file? Or something close to it!

Beware that with that file I would be able to read the data.

With that file I can reproduce your environment and debug better the issue!

the file size too big.
Previously compiled with the timeout-honorbranch.
Try compiling with master now

What glibc are you using?