init & import workflow

Question

init & import workflow

Florents-Tselai opened this issue a year ago · 3 comments

I feel that the init may be redundant and/or should be done implicitly.
For first-time users, I'd like them to be able to get up and running ASAP.
That is:

warcdb import f1.warc f2.warcz f3.warc.gz ...

Adding an initialization step however is necessary to ensure that we're in sync with the "current" relational representation of a warc file. But such a representation will undoubtedly change either drastically (table renames) or incrementally (OLAP-like views will be added)

In practical terms, that means that when an import command is issued, the following steps happen:

When creating a new archive (the file we're importing into does not exist), just use the latest schema and proceed normally.
If the archive file exists, figure out its current version as stored in the DB. If the package version is newer, apply migrations and proceed with the import.
If the package version is older than the archive's version, abort and prompt the user to upgrade the package.

Notes

The current package (application) and schema versions are coupled, and I don't see a reason to change that.
IIRC, for v0.1.0, no such data was stored in the DB, which is a shame, so we should make this default if it does not exist and store it explicitly for v0.2.0 and later.

Answer 1 · 2023-10-21T13:46:31.000Z

I understand wanting to do away with init. It reminded me of git, which I liked. But if you really don't like it perhaps an implicit migrate could happen whenever you run import?

Answer 2 · 2023-10-21T13:50:22.000Z

Yes, pretty much what I described above, right ?

Answer 3 · 2023-10-21T15:59:30.000Z

I think so!