Separation of natural and surrogate keys?

Question

Separation of natural and surrogate keys?

Closed this issue 8 years ago · 24 comments

It seems a bit overkill to me to have both "los-id", "loi-id" and "component-code". In component-studied, los-id seems to be = component-code plus a type prefix (for Course, I presume), while loi-id seems to be some other human-readable code. In component-recognized, los-id and loi-id seem to be global, artificial ID's.
In the model, LOS Code is meant to be a human-readable code (like component-code), and LOI is identified by that plus Academic Term. Are artificial ID's necessary at all?

Answer 1 · 2017-01-03T09:46:27.000Z

Never mind - I see this is explained in the Courses README...

Answer 2 · 2017-01-03T10:04:05.000Z

Ok - I see the need for the extra identifiers, but I don't think "component-code" is a good name: LA Component is the combination of a LOI and a LA, but the code identifies a LOS, and should be called los-code, in my opinion. Or is it important to conform to the EUC template on this, @wrygiel ?

Answer 3 · 2017-01-04T13:24:40.000Z

Or is it important to conform to the EUC template on this, @wrygiel ?

Not necessarily. We could call it los-code too.

We will have a hard time explaining the difference between los-id and los-code to all the partners... I'm still trying to come up with a better solution for this problem. One of the alternatives to consider is including the LOS type in the LOS ID, e.g. Course/ABC where ABC would be the "component code" to be displayed to the user. XSD could verify that all LOS IDs begin with LOS type. I didn't think it through yet, but perhaps this would be better than the currently proposed solution. Do you think this could work?

Answer 4 · 2017-01-04T14:48:48.000Z

We will have a hard time explaining the difference between los-id and los-code to all the partners...

How about a naming convention where "-id" means an artificial key (system-generated and not necessarily human-readable) and "-code" means a natural one (i.e. human-readable and possibly represented in a SIS)? From what I've seen, this seems to be a common pattern in both systems and literature...

I'm still trying to come up with a better solution for this problem. One of the alternatives to consider is including the LOS type in the LOS ID, e.g. Course/ABC where ABC would be the "component code" to be displayed to the user. XSD could verify that all LOS IDs begin with LOS type. I didn't think it through yet, but perhaps this would be better than the currently proposed solution. Do you think this could work?

How about structuring los-id with sub-elements: "type" (i.e. Course) and "los-code"?
From a modeling point of view, this is cleaner.

Answer 5 · 2017-01-04T15:11:32.000Z

How about structuring los-id with sub-elements: "type" (i.e. Course) and "los-code"?
From a modeling point of view, this is cleaner.

How to pass them in GET parameter then?

Answer 6 · 2017-01-04T15:48:58.000Z

How about structuring los-id with sub-elements: "type" (i.e. Course) and "los-code"?
From a modeling point of view, this is cleaner.

How to pass them in GET parameter then?

Ah, there is that... You could split it into two parameters, but that would complicate the interface. I leave the decision to you.

Answer 7 · 2017-01-04T15:57:47.000Z

Another solution I think of is to convince partners to use UUIDs. This might be the easiest way to eliminate this issue and similar ones.

E.g. today one of my colleagues rightly noted, that some of our HEIs tend to change their component codes, or even Organization Unit IDs. I believe these are not so isolated cases. However, if we can convince partners to include UUID columns for each of their EWP entities, we will solve all these issues in one move. What do you think?

Answer 8 · 2017-01-04T16:26:07.000Z

Another solution I think of is to convince partners to use UUIDs. This might be the easiest way to eliminate this issue and similar ones.
E.g. today one of my colleagues rightly noted, that some of our HEIs tend to change their component codes, or even Organization Unit IDs. I believe these are not so isolated cases. However, if we can convince partners to include UUID columns for each of their EWP entities, we will solve all these issues in one move. What do you think?

Yes - with the naming convention I suggested earlier, we would get:

los-id: UUID
loi-id: UUID
los-code: Optional natural key (if the system has such a thing)

Agree?

Answer 9 · 2017-01-04T16:46:41.000Z

Sounds okay. How about using UUIDs in ounit-id too?

Answer 10 · 2017-01-05T09:12:23.000Z

How about using UUIDs in ounit-id too?

Sounds good (with an optional, natural ounit-code as well).

Answer 11 · 2017-01-05T10:20:53.000Z

This proposal is a big change as it will probably require adding UUID columns in many tables in all partner databases. @erasmus-without-paper/all-members - what do you think?

UUIDs would fix two separate problems: 1. The problem of many partners not having a common unique ID for LOSes. 2. The problem of some partners not using surrogate keys in their internal IDs.

Still, there are other ways of fixing these problems, and some partners might be against introducing UUIDs.

Answer 12 · 2017-01-05T12:03:06.000Z

This proposal is a big change as it will probably require adding UUID columns in many tables in all partner databases. @erasmus-without-paper/all-members - what do you think?
UUIDs would fix two separate problems: 1. The problem of many partners not having a common unique ID for LOSes. 2. The problem of some partners not using surrogate keys in their internal IDs.
Still, there are other ways of fixing these problems, and some partners might be against introducing UUIDs.

How about generalizing the ID elements to contain anything, including UUIDs or natural keys? The important thing is that the IDs don't change - if a HEI never changes its (natural) keys, it can send those?

Answer 13 · 2017-01-05T12:04:57.000Z

This doesn't fix problem 1 though. Partners would need to fix it themselves (e.g. by introducing UUIDs).

Answer 14 · 2017-01-05T12:06:00.000Z

In other words we're struggling between suggesting and enforcing safety measures.

Answer 15 · 2017-01-05T12:27:17.000Z

All partners must present immutable keys (not just for LOS, but for all references for future use). If they have neither UUIDs nor common (natural or surrogate) keys across tables representing the same "thing", they must supply one or the other - either in extra columns, tables or views. I don't see any way around that.

I think your earlier suggestion to concate natural keys with a "table flag" (e.g. "Course/1234") is good for such cases. Generalized IDs allows for maximum flexibility, to minimize the impact on the partner system.

Answer 16 · 2017-01-09T08:19:42.000Z

I think your earlier suggestion to concate natural keys with a "table flag" (e.g. "Course/1234") is good for such cases.

Ok! Let's try this then.

Answer 17 · 2017-01-09T12:26:44.000Z

I will also update the examples, so that they use surrogate IDs in all kinds of entities. Non-UUID IDs will still be allowed everywhere, but I guess that examples should be exemplary. :)

Answer 18 · 2017-01-09T13:04:37.000Z

I think it's done!

@kaiqu, please take a look at the commits above.

Answer 19 · 2017-01-09T15:55:05.000Z

Looks good (except see my comment about documenting that IDs must be stable over time).

Answer 20 · 2017-01-09T19:46:20.000Z

I believe you mean this comment.

Immutability is an integral part of being a surrogate key, and we already do advise usage of surrogate keys. But perhaps you are right, and we should state the reason (immutability) more clearly. Maybe it even deserves a separate section in Architecture and Security document.

Answer 21 · 2017-01-09T20:59:07.000Z

Added some more notes in key places. Also added iia-code and ounit-code (which will work in parallel with existing iia-id and ounit-id, as los-code works in parallel with los-id).

Answer 22 · 2017-01-10T09:16:02.000Z

Immutability is an integral part of being a surrogate key, and we already do advise usage of surrogate keys. But perhaps you are right, and we should state the reason (immutability) more clearly.

Yes, especially since they can be a mixture of "classical" surrogate keys like serial number, UUIDs, concatenations of natural keys with type labels...

Maybe it even deserves a separate section in Architecture and Security document.

Yes, it should definitely be in an overarching description, as it is a general principle across APIs.

Answer 23 · 2017-01-10T09:27:23.000Z

Also added iia-code and ounit-code (which will work in parallel with existing iia-id and ounit-id

I think the Institutions API should follow the same pattern, replacing other-id with hei-code.

Answer 24 · 2017-01-10T09:50:27.000Z

Yes, it should definitely be in an overarching description, as it is a general principle across APIs.

People tend to forget those when they get into the details. I put a reminder in each of the key places in XSDs instead (there are not so much of them).

I think the Institutions API should follow the same pattern, replacing other-id with hei-code.

This is a little different, because many of the other-id values are in fact immutable (or even surrogate) IDs (e.g. PIC codes). Let's leave it as it is for now.