pacificclimate/pycds

Refactor tests

Closed this issue · 3 comments

Original issue title: Table definitions need to be versioned
Original issue:

At present, precisely one version of each table proper (not views etc) is defined; they are all in pycds/orm/tables.py. By contrast, view, matview, and function definitions are structured so that there can be many versions, corresponding different revisions.

This arrangement supports testing; specifically, it enables tests of older revisions to remain valid even after later revisions have changed the definitions of the items (views, etc.) involved. Each revision of an item is tested by code targetted at the revised version. Such tests are clear and simple and need no reference to history, apart from their own moment in it. Also, since in principle any revision is always a valid state of the database, and since new and existing databases may be migrated through or to those revisions, we don't want to lose visibility of what those revisions were intended to do and how they were tested. Hence versioned, always-available definitions of items.

Note: Each release of the ORM picks out the latest revision of each item and publishes it at the top level pycds module.

Now that we are changing table definitions (new and changed columns, etc.), we need to have multiple versions of the table definitions too. If we don't, both the historical definitions of items fail (because they assume a specific version of the table definitions, which have in fact changed), and the tests of these items.

The fix for this is straightforward:

  • Structure the table definitions like the view and matview definitions.
  • Move the current version of the table definitions into a submodule named for the appropriate revision ( 522eed334c85, I think).
  • Modify the definitions of other items (views etc.) to use the appropriate revision of the table definitions.
  • Tests likewise.
  • Consider whether it is worth it to define a full set of definitions for each revision. These of course would be pickers similar to the main publish-latest-version strategy, and not redundant definitions of unchanged tables in reach revision.
    • I'm thinking not, and that it might actually be worse in the case of simultaneous updates of tables and other objects.
  • In migrations that modify table definitions, create suitable new revision versions and use them where required. This is really the point of this effort.
  • Publish latest versions of table definitions up the hierarchy in the same way that view etc. are.

I have to admit that I'm a little skeptical as to whether the prospective payoff here would be worth the added complexity of maintaining tests for every schema migration. I understand the use case... but... do we really need to be exercising the tests during development for code that we have theoretically moved on from?

I.e. the tests are for the benefit of the developer for the code that they are working on now. To know that the code that they have written works, and to know that the code that is still in the code base that may interact with code that they have written continues to work. What's the value of running tests for code that is not by definition obsolete?

I can see where this would make sense if we're actively running and developing on multiple versions of the schema. We may operate on multiple versions of the schema (for "reasons"), but I don't think that that's something that we actively aspire to. Generally we want all of our applications to be running the most recent version of the schema.

I may be willing to go along with this, but I'd like to push back a bit and have us consider the value proposition for doing so...

OK, this prompts me to better thinking. Basically, I agree except for a small subset of tests.

There are two kinds of tests:

  • Migration tests, aka smoke tests, which test the operation of a single migration and whether it modifies the schema (structure) as intended.
  • Functional tests, which test the behaviour of a database schema object after migration. E.g., test whether a view or materialized view contains the rows expected given a certain database content.

Both kinds of tests are presently placed in a per-migration folder, parallelling the arrangement of the migration scripts they test. Each folder tests the results of that migration alone. That's not completely necessary:

  • Migration tests (smoke tests) for each migration must continue in roughly this pattern. Unless we remove all but the latest migration test, something to which I feel some resistance, for reasons I can't quite articulate at the moment.
  • Functional tests can be moved to a single collection that tests the result of migrating to the current head revision in the repo. When the current head changes (new migration is added), then tests for all objects that have been affected are also updated. Past tests are in the past commits, but not visible in the latest commit. This almost certainly a better arrangement; simpler and more understandable.
  • I worry there may be cases when we need functional tests per-migration, but likely we should face that if and when it arises. It will likely reveal itself as we refactor (functional) tests this way.

We do need to continue maintaining past versions of all definitions of replaceable database objects (non-mutable objects; basically, anything except a table). This is because migration scripts that modify these objects require both the original and revised definition of each object. This isn't optional; it's built into how replaceable objects work. Tests are not the only (or even main) reason for keeping past versions of these definitions.

We don't need to maintain past versions of table definitions this way, because they aren't managed as replaceable objects (because they are mutable). This is probably what I was thinking about when I set things up as they are.

In the end, this issue becomes "Refactor tests" rather than "Version tables."