insitro/redun

postgresql database doesn't upgrade version when using Scheduler.

njbernstein opened this issue · 6 comments

Hi all,

I'm following the instructions here: https://github.com/insitro/redun/blob/main/examples/postgres/README.md

But trying to run using a Scheduler rather than from a cli and I'm running into the following issue.
When I run the following:

scheduler = Scheduler(
    config=Config(
        {
            "backend": {
                "db_uri": "postgresql://localhost:5432/redun",
            },
            "executors.default": {
                "type": "local",
                "max_workers": 40,
            },
        }
    )
)

This doesnt return the usual db version upgrade message Upgrading db from version -1.0 to 3.2...

And when I run
scheduler.load()

I unsurprisingly get the following error:

RedunVersionError: redun database is an incompatible version -1.0. Version must be within >=3.1,<4

How can I get a postgres database upgraded?

Ah looks like I have to manually run:

scheduler.backend.create_engine()
scheduler.backend.migrate()

And im good to go.

Good to hear you figured out the migration. We have more documented here:

https://insitro.github.io/redun/db.html#database-migration

Briefly, for convenience we auto-upgrade sqlite databases, but require explicit upgrades for Postgres databases since those tend to be shared with other clients and more coordination is likely needed.

We have a CLI command for doing the equivalent of that python code. For example,

redun db upgrade

@mattrasmus I'm a relative novice with SQL. Could you give your advice on postgres vs sqlite?

I've had some redun workflows fail because of database locks with sqlite but at least each workflow gets its own sqlite database.

I know postgres in general should be better at avoiding this issue, but I'm concerned about having multiple workflows pointed at the same postgres database. Should i be?

Or should I be spinning up a new postgres database for every workflow?

I've had some redun workflows fail because of database locks with sqlite but at least each workflow gets its own sqlite database.

I know postgres in general should be better at avoiding this issue, but I'm concerned about having multiple workflows pointed at the same postgres database. Should i be?

Yep, once you have multiple redun schedulers writing to one db simultaneously, then postgres is the appropriate choice. Sqlite will have locking issues for simultaneous writes, which it sounds like you're seeing.

Or should I be spinning up a new postgres database for every workflow?

You can comfortably use one postgres for all workflows. For example, we use one central db for all production workflows across our internal team. redun will use unique ids and hashes to keep such combined recordings correct. Hopefully, that helps.

It does!

@mattrasmus do you use a volume to ensure persistence? or backup to athena or the like?

@mattrasmus do you use a volume to ensure persistence? or backup to athena or the like?

For Postgres, we run it in RDS. That way we can have redun schedulers running in EC2s, Batch jobs, laptops, etc.