xerial/sqlite-jdbc

Do Not Use Lock File

belugabehr opened this issue · 3 comments

Describe the bug
The SQLiteJDBCLoader class uses a "lock file" to determine when a process is using a SQLite binary library file. The lock file is unreliable, it may leak the lock file, and therefore may allow many instances of the SQLite binary library files to accumulate in the /tmp directory.

Deletion will be attempted only for normal termination of the virtual machine, as defined by the Java Language Specification.

Note: this method should not be used for file-locking, as the resulting protocol cannot be made to work reliably. The FileLock facility should be used instead.

https://docs.oracle.com/javase/8/docs/api/java/io/File.html#deleteOnExit--

Expected behavior
Unused copies of the SQLite binary files are cleaned up reliably.

For additional context, I came across a scenario where a service on a host was stuck in a fail-retry loop. It ended up filling up the drive of the host with these files and knocked out all other services on the host as a result.

I've been looking at this question quite a bit in-depth.

The biggest issue in my mind is trying to cleanup after oneself in failure scenarios. I think the answer is: you don't.

As things stand, every time the application starts, this library scans the entire contents of the /tmp directory. From a traditional, long-running, production server standpoint, this is not ideal as it adds some amount of extra startup time and variability into the initialization. If you are running a long-running traditional production server, you are best served by shipping your application with a copy of the sqlitejdbc binary and you configure sqlite-jdbc to use that explicit copy instead of trying to use this auto-deploy feature. Also, as I learned the hard way, the current cleanup code is not all that robust and will leak files if both the jdbcsqlite binary and its corresponding lock file are not cleaned up during an abnormal termination of the JVM (i.e., System.exit(-1)).

If you are using an application that is transient: short-lived docker container, or AWS Lambda, then the auto-discovery feature seems perfectly fine and simplifies things for you.

If you are running this from a CLI, or from multiple JVMs concurrently (and you're not shipping your own copy of the sqlitejdbc file), I think it would be most helpful to instead re-use any sqlitejdbc files in the /tmp directory. There already exists code here to verify if a file on disk is byte-for-byte the same as the one being shipped (i.e. stored in Java Resources package). I think in those scenarios, it should treat /tmp as a local cache: scan the /tmp directory and discover any existing copies instead of dropping yet-another copy onto disk for every invocation of the JVM.

PR welcome