Big-Life-Lab/PHES-ODM

Hierarchical foreign keys

draos opened this issue · 7 comments

draos commented

I think that on several tables the foreign key implementing hierarchical self-relationship is declared backwards:

In table datasets, column datasetID is declared as the primary key. There is also a column parDatasetID, presumably denoting the parent dataset. However, one of foreign key constraints is declared as this:
FOREIGN KEY ("datasetID") REFERENCES "datasets"("parDatasetID")

I think that this was meant the other way around - parDatasetID referencing datasetID. BTW, declaration as it is now, fails on, say, PostgreSQL because parDataserID lacks a unique constraint, which targets of foreign key constraints must have.

I see a similar problem on tables protocols and sites, while protocolSteps has the self-referencing foreign key oriented correctly.

I did not check this in the original model in Lucid, but pdf with ERD indicates that the original model has the same issue, that is, that the export was correct.

Or do I miss something?

Thanks for pointing this out and mea culpa for not thoroughly checking the Lucid-generated pseudo-code. We'll adjust and test the code IRL at least in SQL Server and PostgreSQL environments before release. For other flavors (which I don't have access to) maybe there'll be contributions from the community...

draos commented

Thank you, Sorin!

However, I still see in v2-RC2, in generated MySQL script, hierarchical foreign key declaration "upside down":

CREATE TABLE `datasets` (
  `parDatasetID` varchar,
  `datasetID` varchar,
  `datasetDate` datetime,
  `name` varchar,
  `license` varchar,
  `descr` varchar,
  `refLink` varchar,
  `langID` int,
  `funderCont` varchar,
  `custodyCont` varchar,
  `funderID` varchar,
  `custodyID` varchar,
  `lastEdited` datetime,
  `notes` varchar,
  PRIMARY KEY (`datasetID`),
  FOREIGN KEY (`datasetID`) REFERENCES `datasets`(`parDatasetID`)
);

If datasetID is the primary key, and parDatesetID points to its "parent", then the foreign key declaration should be

FOREIGN KEY (parDatasetID) REFERENCES datasets(datasetID)

Dear Dragi. My suspicion is that Lucid has no real notion of the order of elements that should be implemented for the generated code to be functional. Instead, I believe they generate them in the order in which they were created, hence the chaos in the order of elements, since this was not known or considered when we created the model. We'll do a cleanup before posting the official code, to make sure it will work.

draos commented

Hi, Sorin!

Yes, Lucid has no notion of the correct order of tables and key declarations to export, but the Python script included in the project takes care of that, by moving all foreign key declarations into "alter table" statements at the end of the script. That works.

My complaint is about the "direction" of foreign key declaration in hierarchical tables, such as sample. It should be something like
foreign key fk_xyz (parentID) references xyz(primaryID)
not the other way around.

It is trivial to fix this manually when creating the local database, but I think we should polish the model completely. BTW, the ERD in Lucig looks OK at the first glance, so my suspicion is that the error is with Lucid export.

How are these FK constraints declared in your actual ODMv2 database?

Hmm, my hypothesis goes bust. I tried redefining the connection between the siteID and parSiteID by going first from siteID to parSiteID and then in the reverse direction (of course, maintaining the visual representation of the relationship, which means changing the endpoint icon in each case). The generated code stayed the same. So it's not the order of the "drawing" of the connection. Could it be the generation timestamp of the fields? I dunno. I need to dig deeper to control the situation...

draos commented

I created a trivial model, and it looks to me it exports correctly (as far as I understand "crow feet" notation - I am so old that I prefer CODASYL notation with an arrow from foreign key to the corresponding primary or unique...)
This is the toy model: https://lucid.app/lucidchart/14c72666-587a-4a13-a727-bba1c4a551bc/view?page=0_0&invitationId=inv_dea16baf-af1d-4887-a73b-644e85ff14f7#

(I don't know how to share it with edit permissions, but it takes a minut to reproduce it.

It generates this PostgreSQL script:

CREATE TABLE `something_1` (
  `somethingID` serial,
  `parentID` integer,
  `attribute` varchar,
  PRIMARY KEY (`somethingID`),
  FOREIGN KEY (`parentID`) REFERENCES `something_1`(`somethingID`)
);

CREATE TABLE `something_2` (
  `somethingID` serial,
  `parentID` integer,
  `attribute` varchar,
  PRIMARY KEY (`somethingID`),
  FOREIGN KEY (`somethingID`) REFERENCES `something_2`(`parentID`)
);

CREATE TABLE `whole` (
  `wholeID` serial,
  `attribute` varchar,
  PRIMARY KEY (`wholeID`)
);

CREATE TABLE `part` (
  `partID` serial,
  `wholeID` integer,
  `attribute` varchar,
  PRIMARY KEY (`partID`),
  FOREIGN KEY (`wholeID`) REFERENCES `whole`(`wholeID`)
);

Judging by this, hierarchical foreign keys in the ODM (e.g. on sample) are "upside down". Perhaps that happened because the primary key happens not to be the first attribute.

Thank you for your effort - I hope I am not being too anal about this... :-D

Thank you for your experiment and don't worry, it's actually important to understand the limitations of the tools we are working with. It would have been wonderful to have this one generating "picture perfect" code, but as is, it means we still need to invest some elbow grease to get usable results. Good to know. Thanks again!