erasmus-without-paper/ewp-specs-api-omobilities

Separation of natural and surrogate keys?

Closed this issue · 24 comments

kaiqu commented

It seems a bit overkill to me to have both "los-id", "loi-id" and "component-code". In component-studied, los-id seems to be = component-code plus a type prefix (for Course, I presume), while loi-id seems to be some other human-readable code. In component-recognized, los-id and loi-id seem to be global, artificial ID's.
In the model, LOS Code is meant to be a human-readable code (like component-code), and LOI is identified by that plus Academic Term. Are artificial ID's necessary at all?

kaiqu commented

Never mind - I see this is explained in the Courses README...

kaiqu commented

Ok - I see the need for the extra identifiers, but I don't think "component-code" is a good name: LA Component is the combination of a LOI and a LA, but the code identifies a LOS, and should be called los-code, in my opinion. Or is it important to conform to the EUC template on this, @wrygiel ?

Or is it important to conform to the EUC template on this, @wrygiel ?

Not necessarily. We could call it los-code too.

We will have a hard time explaining the difference between los-id and los-code to all the partners... I'm still trying to come up with a better solution for this problem. One of the alternatives to consider is including the LOS type in the LOS ID, e.g. Course/ABC where ABC would be the "component code" to be displayed to the user. XSD could verify that all LOS IDs begin with LOS type. I didn't think it through yet, but perhaps this would be better than the currently proposed solution. Do you think this could work?

kaiqu commented

We will have a hard time explaining the difference between los-id and los-code to all the partners...

How about a naming convention where "-id" means an artificial key (system-generated and not necessarily human-readable) and "-code" means a natural one (i.e. human-readable and possibly represented in a SIS)? From what I've seen, this seems to be a common pattern in both systems and literature...

I'm still trying to come up with a better solution for this problem. One of the alternatives to consider is including the LOS type in the LOS ID, e.g. Course/ABC where ABC would be the "component code" to be displayed to the user. XSD could verify that all LOS IDs begin with LOS type. I didn't think it through yet, but perhaps this would be better than the currently proposed solution. Do you think this could work?

How about structuring los-id with sub-elements: "type" (i.e. Course) and "los-code"?
From a modeling point of view, this is cleaner.

How about structuring los-id with sub-elements: "type" (i.e. Course) and "los-code"?
From a modeling point of view, this is cleaner.

How to pass them in GET parameter then?

kaiqu commented

How about structuring los-id with sub-elements: "type" (i.e. Course) and "los-code"?
From a modeling point of view, this is cleaner.

How to pass them in GET parameter then?

Ah, there is that... You could split it into two parameters, but that would complicate the interface. I leave the decision to you.

Another solution I think of is to convince partners to use UUIDs. This might be the easiest way to eliminate this issue and similar ones.

E.g. today one of my colleagues rightly noted, that some of our HEIs tend to change their component codes, or even Organization Unit IDs. I believe these are not so isolated cases. However, if we can convince partners to include UUID columns for each of their EWP entities, we will solve all these issues in one move. What do you think?

kaiqu commented

Another solution I think of is to convince partners to use UUIDs. This might be the easiest way to eliminate this issue and similar ones.
E.g. today one of my colleagues rightly noted, that some of our HEIs tend to change their component codes, or even Organization Unit IDs. I believe these are not so isolated cases. However, if we can convince partners to include UUID columns for each of their EWP entities, we will solve all these issues in one move. What do you think?

Yes - with the naming convention I suggested earlier, we would get:

  • los-id: UUID
  • loi-id: UUID
  • los-code: Optional natural key (if the system has such a thing)

Agree?

Sounds okay. How about using UUIDs in ounit-id too?

kaiqu commented

How about using UUIDs in ounit-id too?

Sounds good (with an optional, natural ounit-code as well).

This proposal is a big change as it will probably require adding UUID columns in many tables in all partner databases. @erasmus-without-paper/all-members - what do you think?

UUIDs would fix two separate problems: 1. The problem of many partners not having a common unique ID for LOSes. 2. The problem of some partners not using surrogate keys in their internal IDs.

Still, there are other ways of fixing these problems, and some partners might be against introducing UUIDs.

kaiqu commented

This proposal is a big change as it will probably require adding UUID columns in many tables in all partner databases. @erasmus-without-paper/all-members - what do you think?
UUIDs would fix two separate problems: 1. The problem of many partners not having a common unique ID for LOSes. 2. The problem of some partners not using surrogate keys in their internal IDs.
Still, there are other ways of fixing these problems, and some partners might be against introducing UUIDs.

How about generalizing the ID elements to contain anything, including UUIDs or natural keys? The important thing is that the IDs don't change - if a HEI never changes its (natural) keys, it can send those?

This doesn't fix problem 1 though. Partners would need to fix it themselves (e.g. by introducing UUIDs).

In other words we're struggling between suggesting and enforcing safety measures.

kaiqu commented

All partners must present immutable keys (not just for LOS, but for all references for future use). If they have neither UUIDs nor common (natural or surrogate) keys across tables representing the same "thing", they must supply one or the other - either in extra columns, tables or views. I don't see any way around that.

I think your earlier suggestion to concate natural keys with a "table flag" (e.g. "Course/1234") is good for such cases. Generalized IDs allows for maximum flexibility, to minimize the impact on the partner system.

I think your earlier suggestion to concate natural keys with a "table flag" (e.g. "Course/1234") is good for such cases.

Ok! Let's try this then.

I will also update the examples, so that they use surrogate IDs in all kinds of entities. Non-UUID IDs will still be allowed everywhere, but I guess that examples should be exemplary. :)

I think it's done!

@kaiqu, please take a look at the commits above.

kaiqu commented

Looks good (except see my comment about documenting that IDs must be stable over time).

I believe you mean this comment.

Immutability is an integral part of being a surrogate key, and we already do advise usage of surrogate keys. But perhaps you are right, and we should state the reason (immutability) more clearly. Maybe it even deserves a separate section in Architecture and Security document.

Added some more notes in key places. Also added iia-code and ounit-code (which will work in parallel with existing iia-id and ounit-id, as los-code works in parallel with los-id).

kaiqu commented

Immutability is an integral part of being a surrogate key, and we already do advise usage of surrogate keys. But perhaps you are right, and we should state the reason (immutability) more clearly.

Yes, especially since they can be a mixture of "classical" surrogate keys like serial number, UUIDs, concatenations of natural keys with type labels...

Maybe it even deserves a separate section in Architecture and Security document.

Yes, it should definitely be in an overarching description, as it is a general principle across APIs.

kaiqu commented

Also added iia-code and ounit-code (which will work in parallel with existing iia-id and ounit-id

I think the Institutions API should follow the same pattern, replacing other-id with hei-code.

Yes, it should definitely be in an overarching description, as it is a general principle across APIs.

People tend to forget those when they get into the details. I put a reminder in each of the key places in XSDs instead (there are not so much of them).

I think the Institutions API should follow the same pattern, replacing other-id with hei-code.

This is a little different, because many of the other-id values are in fact immutable (or even surrogate) IDs (e.g. PIC codes). Let's leave it as it is for now.