apple/foundationdb

SQLite in StorageServer deadlocked after the node was disconnected and resumed.

Opened this issue · 4 comments

My 3-node 3-duplicates fdb cluster.
version:7.1.27
The operation steps are: Node A is disconnected from the network for 10 minutes and restored. During the data moving, Node B is disconnected from the network for ten minutes and then restored. The storageServer of Node B has an infinite loop. The result of pstack is as follows:
pstack

The backtrace result in the printed Net2RunLoopTrace is as follows:
20240815-172850

I gdb went in and found that I couldn't get a lock from SQLite。I gdb went in and found that I couldn't get a lock from SQLite. Then it tried again indefinitely.
the code is https://github.com/apple/foundationdb/blob/main/contrib/sqlite/sqlite3.amalgamation.c#L37717
screenshot-20240815-173116

the rc is SQLITE_BUSY
the lockIdx is 4 and the n is 4

My 3-node 3-duplicates fdb cluster. version:7.1.27 The operation steps are: Node A is disconnected from the network for 10 minutes and restored. During the data moving, Node B is disconnected from the network for ten minutes and then restored. The storageServer of Node B has an infinite loop. The result of pstack is as follows: pstack

The backtrace result in the printed Net2RunLoopTrace is as follows: 20240815-172850

I gdb went in and found that I couldn't get a lock from SQLite。I gdb went in and found that I couldn't get a lock from SQLite. Then it tried again indefinitely. the code is https://github.com/apple/foundationdb/blob/main/contrib/sqlite/sqlite3.amalgamation.c#L37717 screenshot-20240815-173116

the rc is SQLITE_BUSY the lockIdx is 4 and the n is 4

my cpu: HUAWEI Kunpeng 920 5220
my OS: openEuler 22.03

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

Sorry, I didn't understand what you were trying to express. Are you trying to express that this issue was introduced by SQLite?

Yes, we tried at work to use it for a PersistentQueue and we had a lot of headache and move to rocksdb.