elixir-sqlite/ecto_sqlite3

Getting DBConnection.ConnectionError on sandbox migrations

Closed this issue · 12 comments

D4no0 commented

I am trying to use a sqlite3 database with ecto. Everything seems to work without any problems on the dev environment, however when I am running tests, a lot of times I get this error:

[error] Exqlite.Connection (#PID<0.214.0>) failed to connect: ** (Exqlite.Error) database is locked
[info] == Running 20230905120403 SslMoon.Repo.Migrations.AddChecksTable.change/0 forward
[info] create table checks
** (DBConnection.ConnectionError) could not checkout the connection owned by #PID<0.223.0>. When using the sandbox, connections are shared, so this may imply another process is using a connection. Reason: connection not available and request was dropped from queue after 1760ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
    See DBConnection.start_link/2 for more information
    (ecto_sql 3.10.1) lib/ecto/adapters/sql.ex:913: Ecto.Adapters.SQL.raise_sql_call_error/1
    (elixir 1.15.5) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (ecto_sql 3.10.1) lib/ecto/adapters/sql.ex:1005: Ecto.Adapters.SQL.execute_ddl/4
    (ecto_sql 3.10.1) lib/ecto/migration/runner.ex:327: Ecto.Migration.Runner.log_and_execute_ddl/3
    (elixir 1.15.5) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (stdlib 5.0.2) timer.erl:270: :timer.tc/2
    (ecto_sql 3.10.1) lib/ecto/migration/runner.ex:25: Ecto.Migration.Runner.run/8
    (ecto_sql 3.10.1) lib/ecto/migrator.ex:362: Ecto.Migrator.attempt/8

Any idea on why this might be happening? I've read that sqlite doesn't allow concurrent inserts, however creation of db and tables shouldn't be concurrent in theory.

For reference my test_helper.exs contains also:
Ecto.Adapters.SQL.Sandbox.mode(Repo, :manual)

D4no0 commented

This also seems to not happen every time, maybe some kind of race condition is involved here?

What are your settings for your database?

Specifically :pool, :queue_target and :pool_size for your test environment.

D4no0 commented

These are the settings, the other ones are set to default:

pool: Ecto.Adapters.SQL.Sandbox,
pool_size: 10

I've tried to set pool_size to 1, but it seems the problem remains.
Also I am running this in a docker container if that helps.

How many of those tests are using async: true?

D4no0 commented

None, currently all tests that include DB interaction are disabled.

The failure always occurs at migration stage.

In our integration tests in the test_helper.exs we do this

# Load up the repository, start it, and run migrations
_ = Ecto.Adapters.SQLite3.storage_down(TestRepo.config())
:ok = Ecto.Adapters.SQLite3.storage_up(TestRepo.config())
_ = Ecto.Adapters.SQLite3.storage_down(PoolRepo.config())
:ok = Ecto.Adapters.SQLite3.storage_up(PoolRepo.config())
{:ok, _} = TestRepo.start_link()
{:ok, _pid} = PoolRepo.start_link()
# migrate the pool repo
case Ecto.Migrator.migrated_versions(PoolRepo) do
[] ->
:ok = Ecto.Migrator.up(PoolRepo, 0, Ecto.Integration.Migration, log: false)
_ ->
:ok = Ecto.Migrator.down(PoolRepo, 0, Ecto.Integration.Migration, log: false)
:ok = Ecto.Migrator.up(PoolRepo, 0, Ecto.Integration.Migration, log: false)
end
:ok = Ecto.Migrator.up(TestRepo, 0, Ecto.Integration.Migration, log: false)
Ecto.Adapters.SQL.Sandbox.mode(TestRepo, :manual)

You can ignore PoolRepo its there to test that pooled connections work.

Generally when I am working with sqlite in my personal projects I have an alias I use

def project do
  [
    # ... snip ...
    preferred_cli_env: [
      "test.prepare": :test
    ]
  ]
end

def aliases do
  [
    # ... snip ...
    "test.prepare": ["ecto.drop --quiet", "ecto.create --quiet", "ecto.migrate"]
  ]
end

The above is not necessary, but it's a nice to have.

So the workflow is usually

mix ecto.gen.migration foobar
mix ecto.migrate
# do work
# add tests
mix test.prepare
mix test
D4no0 commented

Ok, this might be promising. I will try it and come back with feedback!

Is this specified somewhere in the official documentation? because I was digging for quite some time and didn't found anything.

No it is not. We can definitely add it in as a "Suggested Workflow" pattern.

Hi @warmwaffles,

I am having the very same error over and over again, especially in CI.

17:38:52.553 [error] Exqlite.Connection (#PID<0.224.0>) failed to connect: ** (Exqlite.Error) database is locked

You can see my entire test setup on SwissSchema's GitHub, but below is the most relevant piece of code:

defmodule SwissSchemaTest do

  # ...
  @database_dir Application.compile_env!(:swiss_schema, :database_dir) # => "/tmp/swiss_schema"

  setup_all do
    File.rm_rf!(@database_dir)

    Enum.each([Repo, Repo2], fn repo ->
      db_name = "#{repo}" |> String.split(".") |> List.last() |> String.downcase()
      db_path = "#{@database_dir}/#{db_name}.db"

      start_link = Function.capture(repo, :start_link, 1)
      start_link.(database: db_path, log: false)

      Ecto.Adapters.SQLite3.storage_up(database: db_path)
      Ecto.Migrator.up(repo, 1, SwissSchemaTest.CreateUsers, log: false)
    end)

    on_exit(fn -> File.rm_rf!(@database_dir) end)
  end

  setup do: on_exit(fn -> Repo.delete_all(User) && Repo2.delete_all(User) end)

It smells like some racing condition, but I'm unable to understand where I'm creating such racing cond. Any help is appreciated :)

Does this happen all the time or is it infrequent? Does it coincide with test that fails (not the cause for the test failure)?

I'm hypothesizing that this may be an unclean shutdown of the failing database.

I haven't looked at the link yet as I am on my phone, but I'll get to it soon.

I just checked it out. The problem seems to only happen during the setup_all block at the very beginning of the test suite, then continues on fine.

I suspect you are not deleting the WAL file and some other various lock files that go along with the sqlite database. That being said, the errors stopped completely when I remove the cleanup procedures.

  setup_all do
    Enum.each([Repo, Repo2], fn repo ->
      db_name = "#{repo}" |> String.split(".") |> List.last() |> String.downcase()
      db_path = "#{@database_dir}/#{db_name}.db"

      start_link = Function.capture(repo, :start_link, 1)
      start_link.(database: db_path, log: false)

      Ecto.Adapters.SQLite3.storage_up(database: db_path)
      Ecto.Migrator.up(repo, 1, SwissSchemaTest.CreateUsers, log: false)
    end)

    :ok
  end

Edit: Also you can just call repo.start_link(database: db_path, log: false) instead of using Function.capture/3 😉

Edit2: Very peculiar indeed. You delete the entire directory which should also delete the other files. I do wonder if the File.rm_rf has not completed before recreating the directory and database.

Edit3: I do suspect that File.rm_rf hasn't actually completed before the repo is created. Injecting a Process.sleep(1_000) doesn't seem to have the logger error occur.

    File.rm_rf!(@database_dir)

    Process.sleep(1000)

    Enum.each([Repo, Repo2], fn repo ->

Edit4 (final): One other solution would be to configure a random database before each run. Trying to utilize something like "/tmp/swiss_schema/#{Ecto.UUID.generate()}/repo.db" would also help here.

The last on_exit in the setup_all never actually completes either and I still see a residual database sitting in my /tmp dir. I suspect the OS/Erlang calls to delete the file don't complete before the process exits and is aborted.

on_exit(fn -> File.rm_rf!(@database_dir) end)

Adding a Process.sleep/1 in there has not left any temp files laying around.

I hope this has been helpful for you.