init & import workflow
Florents-Tselai opened this issue · 3 comments
I feel that the init may be redundant and/or should be done implicitly.
For first-time users, I'd like them to be able to get up and running ASAP.
That is:
warcdb import f1.warc f2.warcz f3.warc.gz ...
Adding an initialization step however is necessary to ensure that we're in sync with the "current" relational representation of a warc file. But such a representation will undoubtedly change either drastically (table renames) or incrementally (OLAP-like views will be added)
In practical terms, that means that when an import command is issued, the following steps happen:
- When creating a new archive (the file we're importing into does not exist), just use the latest schema and proceed normally.
- If the archive file exists, figure out its current version as stored in the DB. If the package version is newer, apply migrations and proceed with the import.
- If the package version is older than the archive's version, abort and prompt the user to upgrade the package.
Notes
- The current package (application) and schema versions are coupled, and I don't see a reason to change that.
- IIRC, for v0.1.0, no such data was stored in the DB, which is a shame, so we should make this default if it does not exist and store it explicitly for v0.2.0 and later.
I understand wanting to do away with init
. It reminded me of git, which I liked. But if you really don't like it perhaps an implicit migrate
could happen whenever you run import
?
Yes, pretty much what I described above, right ?
I think so!