romeerez/orchid-orm

Random conflicts creating schemaMigrations table when migrating empty database

Closed this issue · 14 comments

Occasionally we see random build failures with the rakedb migration where the schemaMigration table seems to be attempted to attempt to be created multiple times, and then we get a failure because it already exists.

it could be a problem in our code, but thought I'd ask if this is something someone else has encountered?

Sorry, I didn't respond earlier. Could you give more context? What command is failing? You're saying there could be a problem in your code, could you share it?

Hi,

All we are doing is running the schema migration and get some random errors both locally and on CI, which seems to be complaining that the schemaMigrations table already exists.

`
.....
(9.0ms) CREATE UNIQUE INDEX "hid_to_email_email_idx" ON "hid_to_email" ("email")
Created versions table
/home/kevin/projects/haven/haven-identity/node_modules/.pnpm/pg@8.11.3/node_modules/pg/lib/client.js:526
Error.captureStackTrace(err);
^

error: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
at /home/kevin/projects/haven/haven-identity/node_modules/.pnpm/pg@8.11.3/node_modules/pg/lib/client.js:526:17
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async performQuery$1 (file:///home/kevin/projects/haven/haven-identity/node_modules/.pnpm/pqb@0.18.28/node_modules/pqb/dist/index.mjs:3985:12)
at async createSchemaMigrations (file:///home/kevin/projects/haven/haven-identity/node_modules/.pnpm/rake-db@2.10.62/node_modules/rake-db/dist/index.mjs:183:5)
at async getMigratedVersionsMap (file:///home/kevin/projects/haven/haven-identity/node_modules/.pnpm/rake-db@2.10.62/node_modules/rake-db/dist/index.mjs:2206:7)
at async migrateOrRollback (file:///home/kevin/projects/haven/haven-identity/node_modules/.pnpm/rake-db@2.10.62/node_modules/rake-db/dist/index.mjs:2256:30)
at async runCommand (file:///home/kevin/projects/haven/haven-identity/node_modules/.pnpm/rake-db@2.10.62/node_modules/rake-db/dist/index.mjs:3808:5) {
length: 256,
severity: 'ERROR',
code: '23505',
detail: 'Key (typname, typnamespace)=(schemaMigrations, 2200) already exists.',
hint: undefined,
position: undefined,
internalPosition: undefined,
internalQuery: undefined,
where: undefined,
schema: 'pg_catalog',
table: 'pg_type',
column: undefined,
dataType: undefined,
constraint: 'pg_type_typname_nsp_index',
file: 'nbtinsert.c',
line: '663',
routine: '_bt_check_unique'
}

Node.js v20.4.0

`

The other thing to add is that the logs show 2 entries Created versions table

`
Created versions table
(7.3ms) CREATE TABLE "hid" (
"hid" varchar(255) PRIMARY KEY,
"created_at" timestamp with time zone NOT NULL DEFAULT now(),
"updated_at" timestamp with time zone NOT NULL DEFAULT now()
)
(5.0ms) CREATE UNIQUE INDEX "hid_hid_idx" ON "hid" ("hid")
Migrated file:///home/kevin/projects/haven/haven-identity/adapter/hid/orchid-pgsql/dist/db/migrations/ts/20230328123026_createHid.ts
(11.4ms) CREATE TABLE "hid_to_email" (
"email" text PRIMARY KEY,
"hid" text NOT NULL,
"created_at" timestamp with time zone NOT NULL DEFAULT now(),
"updated_at" timestamp with time zone NOT NULL DEFAULT now()
)
(8.9ms) CREATE UNIQUE INDEX "hid_to_email_email_idx" ON "hid_to_email" ("email")
Created versions table
/home/kevin/projects/haven/haven-identity/node_modules/.pnpm/pg@8.11.3/node_modules/pg/lib/client.js:526
Error.captureStackTrace(err);
^

error: duplicate key value violates unique constraint "pg_type_typname_nsp_index"

`

Could you share your config/script that is launching migrations? I mean the one with rakeDb. And what is the command to launch?
Is it possible that the migrations are running in parallel by CI? You said you can reproduce this locally, so I guess it's not related to running in parallel.

Not easily I'd have to try and distill it down to the basics. I'll see what I can do.

cutdown.tar.gz

this is a cutdown version, not really doing anything odd about how we use rake apart from modifying the schema import to work with node rather than node-ts.

Earlier this week I pushed updates for the initializer script (Quickstart), and it makes a difference.

Up until recently, it was ts-node by default, and no building of migrations, and no running rake-db over compiled files. There was an option to use swc to make ts-node fast, but I had to remove that because something broke in recent releases of swc or ts-node and it stopped working.

Now it offers to choose between tsx, vite-node, and ts-node. Also will use bun if you run it with bun.
For tsx and vite-node it generates special configs for migrations, and it adds a script to package.json to build migrations, and a script to run compiled migrations.

I see that in your case you're compiling the migrations and then you run dist/main.js to run migrations. You copy ts migration files into dist so rake-db is still running over the original ts files. It works on my computer: I dropped db and test db, compiled, executed pnpm migrate, repeated several times, and it works for me, it creates two tables, creates unique indexes, for both DATABASE_URL and DATABASE_TEST_URL. So I'm still not sure what is the issue.

Why compile migrations before running? It makes sense if you compile in CI, deploy files to a server, and migrations are running on a server, in this way the command to migrate takes less time to execute. In such a case, I suggest trying the init script above choosing tsx, and then copying and adapting configs and scripts to your project.

A simple solution would be to skip the compiling step and run migrations over .ts files, and I can suggest tsx for running it as it works well for me.

Yeah - the migrations are compiled because they are run in an init container in kubernetes as part of migration.

What version of postres are you using? (just wondering).

When you ran the experiment did you start from an empty database each time. You need to start with no schemaMigration table in order to test it reliably.

My guess was that it was finding the schema migration missing for each migration, before running each script, and there was race condition which meant the schemaMigration table was tried to be created twice But could not see how that would be the case.

My steps:

  • loaded tar
  • tar -zxvf cutdown.tar.gz
    cd orchid-pgsql-cutdown/
    npm i
  • changed username, password, host in .env to my local postgres
  • created dbs manually:
    CREATE DATABASE identity; CREATE DATABASE identity_test;
  • npm run migrate

And I have no errors.

I think I know the reason:

The other thing to add is that the logs show 2 entries Created versions table

It is correct, 2 logs because there are two databases - 2nd for tests.

These logs indicate that it was successful.

error: duplicate key value violates unique constraint "pg_type_typname_nsp_index"

This happened right after this index:

CREATE UNIQUE INDEX "hid_to_email_email_idx" ON "hid_to_email" ("email")

I assumed that this happened because the index name was already taken. I tried to reproduce that, and it failed with a similar error, but the error is much more readable, it is obvious:

image

Wondering, is it also colored on your side? If yes, we can see the exact SQL that failed by the color.

I'm using Postgres 16, not so long ago I was using Postgres 15 and all was fine. It's possible that due to an old version your error logs are less human-friendly.

If you check the first error I posted it shows that it is schemaMigrations that is causing the error.

migrations-error

The code generating the error is this:
try { await db.query(CREATE TABLE ${quoteWithSchema({
name: config.migrationsTable
})} ( version TEXT NOT NULL )); (_b = config.logger) == null ? void 0 : _b.log("Created versions table"); } catch (err) { console.log('XXXXXXXX ' + err.code) if (err.code === "42P07") { (_c = config.logger) == null ? void 0 : _c.log("Versions table exists"); } else { throw err; } }

This is failing - but the code seems to be relying on a specific error code to ignore if it already exists. I am getting a different error code 23505. Perhaps the error code returned has a race condition, which is why the bug is non-deterministic?

Thank you for finding this SO question, it helped and now I can reproduce the issue.
It happens when the migration script is running in parallel by more than one instance.

Now I can figure this out, going to add a lock - pg advisory lock as in the linked question, or a lock table as in other similar tools.

The fix is ready, now it should be fine!

Published in rake-db 2.11.0

@krichards sorry that it took more than a month to figure out and fix, please let me know if the fix works for you.

So it was solved back then, should be fine to close this.