sqlite write locks aren't respected in WSL

Question

sqlite write locks aren't respected in WSL

tdsmith opened this issue 8 years ago · 39 comments

Your Windows build number: (Type ver at a Windows Command Prompt)

10.0.15063

What you're doing and what's happening: (Copy&paste specific commands and their output, or include screen shots)

sqlite's write locking seems to be broken under WSL both inside and outside the /mnt hierarchy; attempting to write to the same database from distinct processes using normal sqlite locking throws I/O errors from sqlite and leads to database corruption.

What's wrong / what should be happening instead:

Only one process should get an exclusive database lock at a time.

Sorry I don't have a minimal reproducer or a deep understanding of how sqlite's locking works. This may or may not have the same root cause as #1927 and #1712.

Answer 1 · 2017-08-04T18:48:42.000Z

@tdsmith - Thanks for your post. Can you share any repro? Or at least some source code reference? Without that it's difficult to say what's going on.

Answer 2 · 2017-08-04T19:47:25.000Z

Here's a Python script that tries to open connections on a sqlite database from separate processes:

#!/usr/bin/env python3

from itertools import product
import multiprocessing
import sqlite3

DBNAME = "test.sqlite"

def init_db():
    conn = sqlite3.connect(DBNAME)
    conn.execute("DROP TABLE IF EXISTS myvalues")
    conn.execute("CREATE TABLE myvalues(i, j)")
    conn.commit()
    conn.close()

def do_work(i):
    conn = sqlite3.connect(DBNAME)
    my_id = multiprocessing.current_process()._identity[0]
    values = product([my_id], range(i, i+1000))
    conn.executemany("INSERT INTO myvalues VALUES(?, ?)", values) 
    conn.commit()
    conn.close()

def main():
    init_db()
    p = multiprocessing.Pool()
    p.map(do_work, range(0, 50000, 1000))
    p.close()
    p.join()

if __name__ == "__main__":
    main()

On Linux, it succeeds.

tim@neptune:~$ sqlite3 test.sqlite 'select count(*) from myvalues;'
50000

On WSL, the script fails, with the exception:

Traceback (most recent call last):
  File "/home/tim/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/tim/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "concurrent_sqlite_test.py", line 21, in do_work
    conn.commit()
sqlite3.OperationalError: disk I/O error

Running the sqlite3 CLI over the file afterwards gives unusual results (this number varies):

tim@tds:~/foo$ sqlite3 test.sqlite 'select count(*) from myvalues;'
4047

If you disable sqlite's journaling by running conn.execute("PRAGMA journal=OFF"), the file on disk ends up being corrupted.

tim@tds:~/foo$ sqlite3 test.sqlite 'select count(*) from myvalues;'
Error: disk I/O error

Answer 3 · 2017-08-04T19:50:14.000Z

Weirdly, my home directory is now in a state where I can't even create the database anymore in the parent process -- there aren't any visible lock or journal files, so I don't know what state is interfering:

tim@tds:~$ rm -f test.sqlite; python concurrent_sqlite_test.py
Traceback (most recent call last):
  File "concurrent_sqlite_test.py", line 30, in <module>
    main()
  File "concurrent_sqlite_test.py", line 23, in main
    init_db()
  File "concurrent_sqlite_test.py", line 11, in init_db
    conn.execute("DROP TABLE IF EXISTS myvalues")
sqlite3.OperationalError: disk I/O error

But I can mkdir foo; cd foo; python ../concurrent_sqlite_test.py and it resumes failing in the usual way (i.e. in commit() in a child process).

Answer 4 · 2017-08-07T04:48:17.000Z

I can confirm it still happens on 16251 build on both LxFs and DrvFS

Answer 5 · 2017-08-09T16:59:52.000Z

Looking at sqlite's pager.c, it looks like sqlite is setting POSIX advisory byte range locks with fcntl(..., F_SETLK, ...), which indeed makes this a duplicate of #1927.

Answer 6 · 2017-08-09T17:32:15.000Z

A workaround should be to pass an alternative "VFS" name to sqlite3_open_v2().

These are defined in pager.c and include unix-dotfile and unix-flock. There is a brief description in the docs.

The python3 sqlite module does not seem to support setting an alternative VFS but the apsw module does.

Answer 7 · 2017-10-27T08:54:07.000Z

NixOS seems to suffer from the same problem which can be worked around by disabling WAL mode. NixOS/nix#1203

Answer 8 · 2017-10-27T14:49:09.000Z

This issue should affect both WAL and regular journals (I was testing with regular journals), though it's possible disabling WAL makes it harder to trigger.

Answer 9 · 2018-05-04T01:23:05.000Z

Any progress on this?

Answer 10 · 2018-06-18T09:14:23.000Z

I think I have stumbled upon this issue and have a small ruby script that creates a sqlite database, imports logs data and fails like this:

sroot@sroot-msb:/mnt/c/Users/steve/Desktop/motion_sensor_logs$ ruby analyse_importtosqlite.rb                                                           
Anaylse-- import using Ruby                                                                                                                             
motion_counter_20171130090000.log                                                                                                                       
/home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:108:in `step': disk I/O error (SQLite3::IOException) 
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:108:in `block in each'                  
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:107:in `loop'                           
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:107:in `each'                           
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:156:in `to_a'                            
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:156:in `block in execute'                
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:95:in `prepare'                          
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:137:in `execute'

The error appears at different times when run (ie, it will fail at different lines of the log so I ruled out the data format causing the error).
I updated sqlite from source on my laptop, this didn't solve it.
I ran the script on an ubuntu server and it ran perfectly.

My googling suggests the problem is associated with how WSL gets to talk with the sqlite file which lead me here, but I'm rapidly moving out of my depth.

If there's interest, I can provide the ruby script and some sample logs so that others can recreate error. I'll watch this issue for replies

Answer 11 · 2019-02-27T02:02:30.000Z

Any update?

Answer 12 · 2019-03-07T10:43:16.000Z

This issue is driving me crazy right now as well, causing all sorts of random errors.
Windows Version: Microsoft Windows [Version 10.0.17763.253]
Simple steps to reproduce:

Install sqlite3 command line tool
Open two WSL windows
Acess the database in both windows
Issue two BEGIN EXCLUSIVE TRASACTION statements - they both succeed 😮
Do contradictory stuff
Issue END TRANSACTION statements - get a disk I/O error in one window.

What should happen:

One of the windows should error out with database is locked

Answer 13 · 2019-03-07T13:08:11.000Z

I also feel that this bug is a major pain. I've had destroyed subversion repositories (problems with sqllite) and e-mail databases (problem with multiple writers on Xapian indexes) due to this.

Answer 14 · 2019-03-14T23:12:32.000Z

This also affects Fossil (http://fossil-scm.org/) in that it can easily corrupt repository databases. You can reproduce by doing the following:

Running commands in two difference terminal processes (A$ and B$):

A$ export VISUAL=nano
A$ fossil init boop.fossil
A$ mkdir boop
A$ cd boop
A$ fossil open ../boop.fossil
A$ touch test
A$ fossil uv add test
A$ fossil uv edit test

B$ export VISUAL=nano
B$ cd boop
B$ fossil uv edit test
# This command should fail, but instead it works.

Normally B should fail with a message like Database error: database is locked: {INSERT INTO rcvfrom(uid, mtime, nonce, ipaddr)VALUES(1, julianday('now'), NULL, '#!fossil unversioned edit')}, however on WSL it continues normally and allow you to edit the file.

Answer 15 · 2019-03-28T19:47:43.000Z

This issue is blocked by: #1927 ... lxfs doesn't support fcntl locks of any kind, which many programs rely on.

Answer 16 · 2019-05-23T10:12:48.000Z

I've found a workaround to pause for a 0.1 seconds before writing to the database (at least when I'm writing multiple entries at once). This seems to allow SQLite to "recover" and properly handle IO or what ever is happening.

Answer 17 · 2019-11-01T00:49:06.000Z

Running into this as well. This affects visual studio which stores information it needs to process in sqlite. If it operates on a project on the shared FS between wsl2 and Windows, then this completely fails (in this case, it just hangs trying to even acquire the proper fs locks).

Something isn't working properly with this file system which breaks the entire scenario of having shared projects that can be edited by either the windows or linux side of things.

Answer 18 · 2019-11-03T09:27:58.000Z

It also breaks rasa x

Answer 19 · 2019-11-11T19:22:01.000Z

This issue made SQLite unusable for a dependent project, forcing me back to standard Windows. In my current environment, I am unable to upgrade to WSL2. I've found WSL to be otherwise a great tool for *nix power users forced to use Windows.

Answer 20 · 2019-11-30T08:45:19.000Z

Any update on this?

Answer 21 · 2019-12-21T22:30:28.000Z

Hit this issue too while trying the parallel optimization with Optuna (https://optuna.readthedocs.io/en/latest/tutorial/distributed.html), it uses sqlite to share optimization progress between the multiple processes, which is broken under WSL...

BTW, could this be resolved by upgrading to WSL2? Currently I am using WSL1 with Ubuntu 18.04.

Answer 22 · 2019-12-23T14:03:23.000Z

Yes, WSL2 is not a lightweight emulation layer. It is more of a full VM, with ext4 as the root filesystem, which supports fcntl.

…

On Sat, Dec 21, 2019 at 5:30 PM Lukáš Cerman ***@***.***> wrote: Hit this issue too while trying the parallel optimization with Optuna (https://optuna.readthedocs.io/en/latest/tutorial/distributed.html), it uses sqlite to share optimization progress between the multiple processes, which is broken under WSL... BTW, could this be resolved by upgrading to WSL2? Currently I am using WSL1 with Ubuntu 18.04. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Answer 23 · 2020-06-29T22:31:54.000Z

Any users able to confirm this is unblocked in WSL2? If yes, any other blockers preventing SQLite functioning properly?

Answer 24 · 2020-06-30T11:17:25.000Z

Any users able to confirm this is unblocked in WSL2? If yes, any other blockers preventing SQLite functioning properly?

A while ago it was still broken across the OS boundary (e.g. /mnt/c/ or Plan 9 Filesystem Protocol). #2395 (comment)

Answer 25 · 2020-07-21T13:16:05.000Z

The problem seems to be fixed in WSL2. When in WSL1, I was having database lock issues, but after upgrading to WSL 2 the problems all went away, and it worked as expected. My guess is something to do with the lower level kernel code, of Linux vs Windows kernel.

Answer 26 · 2020-07-29T20:32:58.000Z

@willgozlan it is not

Answer 27 · 2020-07-30T01:49:57.000Z

@RahulDey12
It appears to me that you are not running sqlite in WSL2, but rather in Windows PowerShell (where it is apparently also broken). What happens if you start an Ubuntu Linux terminal window and run sqlite there?

Answer 28 · 2020-07-30T14:13:36.000Z

@RahulDey12
It appears to me that you are not running sqlite in WSL2, but rather in Windows PowerShell (where it is apparently also broken). What happens if you start an Ubuntu Linux terminal window and run sqlite there?

I use sqlite client called TablePlus on windows So I wanna use it from windows side. In Wsl side it is working fine.

Answer 29 · 2020-08-19T21:01:02.000Z

That screencap above on 9p is new landing zone #5762. Otherwise this was effectively /dupe #1927 from the get-go.

Answer 30 · 2020-08-19T21:01:07.000Z

Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.

Thanks for your report!

Answer 31 · 2020-08-20T01:40:22.000Z

Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.

Thanks for your report!

So what is the original issue number....

Answer 32 · 2020-08-20T07:48:37.000Z

Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.
Thanks for your report!

So what is the original issue number....

Bad bot! Let's feed it carrots to encourage it to learn to cite duplicates :-)
I think it means #1927 is an earlier report of the same underlying problem. That report is also closed as fixed in WSL2 but I'm no longer in a position to test

Answer 33 · 2021-07-19T13:26:22.000Z

It also breaks rasa x

i'm trying to run rasa x on wsl2 and it's breaking.! were you able to resolved it .?

Answer 34 · 2021-07-19T14:51:33.000Z

It also breaks rasa x

on wsl2 it worked for me

Answer 35 · 2021-08-30T17:13:19.000Z

Still having problems with subversion on WSL2:
`
svn: E200030: sqlite[S10]: disk I/O error
svn: E200042: Additional errors:
svn: E200030: sqlite[S10]: disk I/O error

$ uname -a
Linux 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
`

Answer 36 · 2021-08-31T09:05:30.000Z

Under WSL2, anything under /mnt/*/ is like a remote filesystem, that goes over the Plan 9 Filesystem Protocol (9P).

There sqlite can't lock it's database files. Everywhere else, outside the shared filesystem, sqlite will work.

Under WSL1 it's a bit different, but other breakages means sqlite database files will not work anywhere.

Answer 37 · 2021-08-31T13:24:47.000Z

A workaround I use is to install windows tortoisesvn including the command line utilities, which makes it include the windows version of svn.exe. Then in linux set svn to be aliased to svn.exe via .bashrc:

alias svn="/c/Program\ Files/TortoiseSVN/bin/svn.exe"

Bonus is that, it being a windows exe, filesystem performance is better for stuff under /mnt/*.

Answer 38 · 2022-06-03T05:12:22.000Z

The problem seems to be fixed in WSL2. When in WSL1, I was having database lock issues, but after upgrading to WSL 2 the problems all went away, and it worked as expected. My guess is something to do with the lower level kernel code, of Linux vs Windows kernel.

thanks.
having a same issue with WSL1, and the problems went away after upgrading to WSL2 by

wsl --set-version Ubuntu-18.04 2

wsl -l -v

  NAME            STATE           VERSION
* Ubuntu-18.04    Running         2

~$ cat test.py

#!/usr/bin/env python3

from itertools import product
import multiprocessing
import sqlite3

DBNAME = "test.sqlite"

def init_db():
    conn = sqlite3.connect(DBNAME)
    conn.execute("DROP TABLE IF EXISTS myvalues")
    conn.execute("CREATE TABLE myvalues(i, j)")
    conn.commit()
    conn.close()

def do_work(i):
    conn = sqlite3.connect(DBNAME)
    my_id = multiprocessing.current_process()._identity[0]
    values = product([my_id], range(i, i+1000))
    conn.executemany("INSERT INTO myvalues VALUES(?, ?)", values)
    conn.commit()
    conn.close()

def main():
    init_db()
    p = multiprocessing.Pool()
    p.map(do_work, range(0, 50000, 1000))
    p.close()
    p.join()

if __name__ == "__main__":
    main()

~$ ./test.py
~$ ls
test.py test.sqlite

Answer 39 · 2025-02-28T18:04:00.000Z

FYI this Tuesday CERN had an outage&data loss because of this locking issue. It would be appreciated if someone from Microsoft software engineering could fix this.