Bug: `test_n3c` schema not fully / properly set up (& tests)
Opened this issue · 0 comments
Overview
Tables are created and data is copied over from n3c
. However, no additional constraints, such as PKeys, are set.
Sub-tasks
(3) and (4) are currently lower priority because I'm not sure that any tests rely on these, and especially (4) because it would be time-consuming to do. I think no tests rely on this setup because mostly I'm just using test_n3c
for inserts, and other tests are executing on the actual n3c
schema.
- 1. Set up
test_n3c
schema for PKs- This will probably include some kind of config. Maybe it'd be optimal to tie that config to
initialize()
or elsewhere if needed. - Tests: Reactivate
IntegrityError
tests which check for inserting duplicate records
- This will probably include some kind of config. Maybe it'd be optimal to tie that config to
- 3. Anything else simple for
test_n3c
worth setting up? Constraints (e.g.NOT-NULL
)?- Probably not indexes, since these tables will be quite small (~50 rows).
- 4. Correct relational data between tables
4. Correct relational data between tables
Basically, we're setting up these tables by copying the first 50 rows from each table. However, this is not correct from a "relational data" perspective.
What's meant by "relational data" is like so: A table like code_sets
is primary. Perhaps this is the most / only primary table. For every code_set
in that table, we only need entries in concept_set_container
, concept_set_version_item
, and concept_set_members
that apply to these code sets. Further, we can then filter the concept
table to include only those which are listed in that member table. Then we can filter concept_relationship
and concept_ancestor
but what's there. Then, once these core tables are set up, any derived tables can be updated by running refresh_derived_tables()
.
Perhaps the best way to achieve this is by updating initialize()
so that the "setup test schema" part of it does its own initialization, basically subsetting the code_set
dataset first, and then filtering the other datasets like that, and then uploading. But this will also likely be slower than just doing something similar using the already existing SQL tables.
Also, have to consider how slow it is to do this. Right now I'm running remakes of the test schema at the start of every test suite. If it is too slow, we could consider adding some sort of caching. But we'd have to commit those cached files too, otherwise the GitHub action tests would also run quite slowly.