sqlite write locks aren't respected in WSL
tdsmith opened this issue ยท 39 comments
- Your Windows build number: (Type
ver
at a Windows Command Prompt)
10.0.15063
- What you're doing and what's happening: (Copy&paste specific commands and their output, or include screen shots)
sqlite's write locking seems to be broken under WSL both inside and outside the /mnt hierarchy; attempting to write to the same database from distinct processes using normal sqlite locking throws I/O errors from sqlite and leads to database corruption.
- What's wrong / what should be happening instead:
Only one process should get an exclusive database lock at a time.
Sorry I don't have a minimal reproducer or a deep understanding of how sqlite's locking works. This may or may not have the same root cause as #1927 and #1712.
@tdsmith - Thanks for your post. Can you share any repro? Or at least some source code reference? Without that it's difficult to say what's going on.
Here's a Python script that tries to open connections on a sqlite database from separate processes:
#!/usr/bin/env python3
from itertools import product
import multiprocessing
import sqlite3
DBNAME = "test.sqlite"
def init_db():
conn = sqlite3.connect(DBNAME)
conn.execute("DROP TABLE IF EXISTS myvalues")
conn.execute("CREATE TABLE myvalues(i, j)")
conn.commit()
conn.close()
def do_work(i):
conn = sqlite3.connect(DBNAME)
my_id = multiprocessing.current_process()._identity[0]
values = product([my_id], range(i, i+1000))
conn.executemany("INSERT INTO myvalues VALUES(?, ?)", values)
conn.commit()
conn.close()
def main():
init_db()
p = multiprocessing.Pool()
p.map(do_work, range(0, 50000, 1000))
p.close()
p.join()
if __name__ == "__main__":
main()
On Linux, it succeeds.
tim@neptune:~$ sqlite3 test.sqlite 'select count(*) from myvalues;'
50000
On WSL, the script fails, with the exception:
Traceback (most recent call last):
File "/home/tim/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/tim/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "concurrent_sqlite_test.py", line 21, in do_work
conn.commit()
sqlite3.OperationalError: disk I/O error
Running the sqlite3 CLI over the file afterwards gives unusual results (this number varies):
tim@tds:~/foo$ sqlite3 test.sqlite 'select count(*) from myvalues;'
4047
If you disable sqlite's journaling by running conn.execute("PRAGMA journal=OFF")
, the file on disk ends up being corrupted.
tim@tds:~/foo$ sqlite3 test.sqlite 'select count(*) from myvalues;'
Error: disk I/O error
Weirdly, my home directory is now in a state where I can't even create the database anymore in the parent process -- there aren't any visible lock or journal files, so I don't know what state is interfering:
tim@tds:~$ rm -f test.sqlite; python concurrent_sqlite_test.py
Traceback (most recent call last):
File "concurrent_sqlite_test.py", line 30, in <module>
main()
File "concurrent_sqlite_test.py", line 23, in main
init_db()
File "concurrent_sqlite_test.py", line 11, in init_db
conn.execute("DROP TABLE IF EXISTS myvalues")
sqlite3.OperationalError: disk I/O error
But I can mkdir foo; cd foo; python ../concurrent_sqlite_test.py
and it resumes failing in the usual way (i.e. in commit() in a child process).
I can confirm it still happens on 16251 build on both LxFs and DrvFS
A workaround should be to pass an alternative "VFS" name to sqlite3_open_v2()
.
These are defined in pager.c and include unix-dotfile
and unix-flock
. There is a brief description in the docs.
The python3 sqlite module does not seem to support setting an alternative VFS but the apsw module does.
NixOS seems to suffer from the same problem which can be worked around by disabling WAL mode. NixOS/nix#1203
This issue should affect both WAL and regular journals (I was testing with regular journals), though it's possible disabling WAL makes it harder to trigger.
Any progress on this?
I think I have stumbled upon this issue and have a small ruby script that creates a sqlite database, imports logs data and fails like this:
sroot@sroot-msb:/mnt/c/Users/steve/Desktop/motion_sensor_logs$ ruby analyse_importtosqlite.rb
Anaylse-- import using Ruby
motion_counter_20171130090000.log
/home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:108:in `step': disk I/O error (SQLite3::IOException)
from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:108:in `block in each'
from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:107:in `loop'
from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:107:in `each'
from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:156:in `to_a'
from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:156:in `block in execute'
from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:95:in `prepare'
from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:137:in `execute'
- The error appears at different times when run (ie, it will fail at different lines of the log so I ruled out the data format causing the error).
- I updated sqlite from source on my laptop, this didn't solve it.
- I ran the script on an ubuntu server and it ran perfectly.
My googling suggests the problem is associated with how WSL gets to talk with the sqlite file which lead me here, but I'm rapidly moving out of my depth.
If there's interest, I can provide the ruby script and some sample logs so that others can recreate error. I'll watch this issue for replies
Any update?
This issue is driving me crazy right now as well, causing all sorts of random errors.
Windows Version: Microsoft Windows [Version 10.0.17763.253]
Simple steps to reproduce:
- Install sqlite3 command line tool
- Open two WSL windows
- Acess the database in both windows
- Issue two
BEGIN EXCLUSIVE TRASACTION
statements - they both succeed ๐ฎ - Do contradictory stuff
- Issue
END TRANSACTION
statements - get a disk I/O error in one window.
What should happen:
- One of the windows should error out with
database is locked
I also feel that this bug is a major pain. I've had destroyed subversion repositories (problems with sqllite) and e-mail databases (problem with multiple writers on Xapian indexes) due to this.
This also affects Fossil (http://fossil-scm.org/) in that it can easily corrupt repository databases. You can reproduce by doing the following:
Running commands in two difference terminal processes (A$
and B$
):
A$ export VISUAL=nano
A$ fossil init boop.fossil
A$ mkdir boop
A$ cd boop
A$ fossil open ../boop.fossil
A$ touch test
A$ fossil uv add test
A$ fossil uv edit test
B$ export VISUAL=nano
B$ cd boop
B$ fossil uv edit test
# This command should fail, but instead it works.
Normally B should fail with a message like Database error: database is locked: {INSERT INTO rcvfrom(uid, mtime, nonce, ipaddr)VALUES(1, julianday('now'), NULL, '#!fossil unversioned edit')}
, however on WSL it continues normally and allow you to edit the file.
This issue is blocked by: #1927 ... lxfs doesn't support fcntl locks of any kind, which many programs rely on.
I've found a workaround to pause for a 0.1
seconds before writing to the database (at least when I'm writing multiple entries at once). This seems to allow SQLite to "recover" and properly handle IO or what ever is happening.
Running into this as well. This affects visual studio which stores information it needs to process in sqlite. If it operates on a project on the shared FS between wsl2 and Windows, then this completely fails (in this case, it just hangs trying to even acquire the proper fs locks).
Something isn't working properly with this file system which breaks the entire scenario of having shared projects that can be edited by either the windows or linux side of things.
It also breaks rasa x
This issue made SQLite unusable for a dependent project, forcing me back to standard Windows. In my current environment, I am unable to upgrade to WSL2. I've found WSL to be otherwise a great tool for *nix power users forced to use Windows.
Any update on this?
Hit this issue too while trying the parallel optimization with Optuna (https://optuna.readthedocs.io/en/latest/tutorial/distributed.html), it uses sqlite to share optimization progress between the multiple processes, which is broken under WSL...
BTW, could this be resolved by upgrading to WSL2? Currently I am using WSL1 with Ubuntu 18.04.
Any users able to confirm this is unblocked in WSL2? If yes, any other blockers preventing SQLite functioning properly?
Any users able to confirm this is unblocked in WSL2? If yes, any other blockers preventing SQLite functioning properly?
A while ago it was still broken across the OS boundary (e.g. /mnt/c/
or Plan 9 Filesystem Protocol). #2395 (comment)
The problem seems to be fixed in WSL2. When in WSL1, I was having database lock issues, but after upgrading to WSL 2 the problems all went away, and it worked as expected. My guess is something to do with the lower level kernel code, of Linux vs Windows kernel.
@willgozlan it is not
@RahulDey12
It appears to me that you are not running sqlite in WSL2, but rather in Windows PowerShell (where it is apparently also broken). What happens if you start an Ubuntu Linux terminal window and run sqlite there?
@RahulDey12
It appears to me that you are not running sqlite in WSL2, but rather in Windows PowerShell (where it is apparently also broken). What happens if you start an Ubuntu Linux terminal window and run sqlite there?
I use sqlite
client called TablePlus on windows So I wanna use it from windows side. In Wsl side it is working fine.
Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.
Thanks for your report!
Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.
Thanks for your report!
So what is the original issue number....
Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.
Thanks for your report!So what is the original issue number....
Bad bot! Let's feed it carrots to encourage it to learn to cite duplicates :-)
I think it means #1927 is an earlier report of the same underlying problem. That report is also closed as fixed in WSL2 but I'm no longer in a position to test
It also breaks
rasa x
i'm trying to run rasa x on wsl2 and it's breaking.! were you able to resolved it .?
It also breaks
rasa x
on wsl2 it worked for me
Still having problems with subversion on WSL2:
`
svn: E200030: sqlite[S10]: disk I/O error
svn: E200042: Additional errors:
svn: E200030: sqlite[S10]: disk I/O error
$ uname -a
Linux 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
`
Under WSL2, anything under /mnt/*/ is like a remote filesystem, that goes over the Plan 9 Filesystem Protocol (9P).
There sqlite can't lock it's database files. Everywhere else, outside the shared filesystem, sqlite will work.
Under WSL1 it's a bit different, but other breakages means sqlite database files will not work anywhere.
A workaround I use is to install windows tortoisesvn including the command line utilities, which makes it include the windows version of svn.exe. Then in linux set svn to be aliased to svn.exe via .bashrc:
alias svn="/c/Program\ Files/TortoiseSVN/bin/svn.exe"
Bonus is that, it being a windows exe, filesystem performance is better for stuff under /mnt/*.
The problem seems to be fixed in WSL2. When in WSL1, I was having database lock issues, but after upgrading to WSL 2 the problems all went away, and it worked as expected. My guess is something to do with the lower level kernel code, of Linux vs Windows kernel.
thanks.
having a same issue with WSL1, and the problems went away after upgrading to WSL2 by
wsl --set-version Ubuntu-18.04 2
wsl -l -v
NAME STATE VERSION
* Ubuntu-18.04 Running 2
~$ cat test.py
#!/usr/bin/env python3
from itertools import product
import multiprocessing
import sqlite3
DBNAME = "test.sqlite"
def init_db():
conn = sqlite3.connect(DBNAME)
conn.execute("DROP TABLE IF EXISTS myvalues")
conn.execute("CREATE TABLE myvalues(i, j)")
conn.commit()
conn.close()
def do_work(i):
conn = sqlite3.connect(DBNAME)
my_id = multiprocessing.current_process()._identity[0]
values = product([my_id], range(i, i+1000))
conn.executemany("INSERT INTO myvalues VALUES(?, ?)", values)
conn.commit()
conn.close()
def main():
init_db()
p = multiprocessing.Pool()
p.map(do_work, range(0, 50000, 1000))
p.close()
p.join()
if __name__ == "__main__":
main()
~$ ./test.py
~$ ls
test.py test.sqlite
FYI this Tuesday CERN had an outage&data loss because of this locking issue. It would be appreciated if someone from Microsoft software engineering could fix this.