jhu-bids/TermHub

Bug: `test_n3c` schema not fully / properly set up (& tests)

Opened this issue · 0 comments

Overview

Tables are created and data is copied over from n3c. However, no additional constraints, such as PKeys, are set.

Sub-tasks

(3) and (4) are currently lower priority because I'm not sure that any tests rely on these, and especially (4) because it would be time-consuming to do. I think no tests rely on this setup because mostly I'm just using test_n3c for inserts, and other tests are executing on the actual n3c schema.

  • 1. Set up test_n3c schema for PKs
    • This will probably include some kind of config. Maybe it'd be optimal to tie that config to initialize() or elsewhere if needed.
    • Tests: Reactivate IntegrityError tests which check for inserting duplicate records
  • 3. Anything else simple for test_n3c worth setting up? Constraints (e.g. NOT-NULL)?
    • Probably not indexes, since these tables will be quite small (~50 rows).
  • 4. Correct relational data between tables

4. Correct relational data between tables

Basically, we're setting up these tables by copying the first 50 rows from each table. However, this is not correct from a "relational data" perspective.

What's meant by "relational data" is like so: A table like code_sets is primary. Perhaps this is the most / only primary table. For every code_set in that table, we only need entries in concept_set_container, concept_set_version_item, and concept_set_members that apply to these code sets. Further, we can then filter the concept table to include only those which are listed in that member table. Then we can filter concept_relationship and concept_ancestor but what's there. Then, once these core tables are set up, any derived tables can be updated by running refresh_derived_tables().

Perhaps the best way to achieve this is by updating initialize() so that the "setup test schema" part of it does its own initialization, basically subsetting the code_set dataset first, and then filtering the other datasets like that, and then uploading. But this will also likely be slower than just doing something similar using the already existing SQL tables.

Also, have to consider how slow it is to do this. Right now I'm running remakes of the test schema at the start of every test suite. If it is too slow, we could consider adding some sort of caching. But we'd have to commit those cached files too, otherwise the GitHub action tests would also run quite slowly.