mekomsolutions/openmrs-module-initializer

Full specified names UUIDs to be seeded from their concept's UUID.

mks-d opened this issue ยท 17 comments

mks-d commented

Background

See

Proposed Solution

By default fully specified names should be set a UUID that can be seeded and inferred from their concept's UUID. This would happen around here and go along the lines of:

import java.util.UUID;
...
String seed = concept.getUuid() + "_" + locale.toString();
conceptName.setUuid(UUID.nameUUIDFromBytes(seed.getBytes()).toString());

This would ensure that, in each locale, each FSN is always set a UUID in the exact same and predictable way.

Credits to @rbuisson for sharing his examples here and here.

Acceptance Criteria

This enhancement should come with a Liquibase changeset that applies the above seeding logic to existing FSNs. It is possible to apply Java-based logic through a changeset, see <customChange/>.

This is an interesting idea, but I think we need to be very careful. A few things to watch out for:

  1. There may be other tools and processes at use within an implementation that would not respond well to having uuids of existing objects changed.
  2. The assumption that a UUID is unique and would never been issued twice to two different entities is something we rely on heavily in OpenMRS, and any break from a well-established and trusted UUID-generation algorithm should be treated extremely cautiously.
  3. Following from the above point, the implementation above would not be sufficient as many concept names may exist for the same concept with the same locale. Even if we accounted for things like name type and locale_preferred, we then need to think about accounting for voided names, or even purging and recreating names.

I do think if we are careful enough, this could work well. Basically we need to think of it the same way we would think of the implementation of hashcode/equals methods - make sure all fields that make a particular concept name unique are accounted for.

Beyond this, the most reliable fix to the issue you are facing is to get support for uuid into the domain, and then have a script or tool that one could use to update their concepts.csv file with the uuid of the existing concept in their dictionary, as a one-time step, and then populating this at the time of authoring moving forward.

mks-d commented

Thanks @mseaton.

Basically we need to think of it the same way we would think of the implementation of hashcode/equals methods - make sure all fields that make a particular concept name unique are accounted for.

So the seed string should be a concatenation of the properties that make the name unique:

  • The name itself (as string)
  • The name type (always FSN)
  • The locale

?

If we limit this to FSNs (and that's the intent), the pair name as string + locale should be a unique seed no?


(We may indeed end up doing what you suggested, I am definitely not ruling it out, aka implementing a small piece of #136, just the support for the Fully specified name:en:UUID header and then a script to migrate concept names as a one off.)

So, I actually implemented basically this same idea in the OCL module, though for a few limited cases.

It's worth pointing out that the nameUUIDFromBytes() function has roughly the same uniqueness properties (assuming that it's properly done) as random UUIDs (and, more importantly, actually can't duplicate any existing random UUIDs because the UUID variant is encoded in the bytes that make up the UUID).

The actual algorithm recommends having a namespace-specific UUID that's concatenated to the name, so, assuming we're happy with having a single constant for this, we might have an Iniz concept name UUID (say "4646f1d8-6145-40c4-9142-d81fededb5f7") and prepend that to the seed to generate the UUID.

Seed-wise, I think:

  • Concept UUID (since concept name is metadata about the concept)
  • Name itself
  • Name type (if necessary)
  • Locale

are probably unique enough. (Is there any instance where it's sensible to retire a name and then recreate the exact same name for the same concept with the same name type?)

This enhancement should come with a Liquibase changeset that applies the above seeding logic to existing FSNs.

Like @mseaton I'd be a bit hesitant to apply something like that to an existing database and probably not as a default.

Somewhat tangential, but I've been thinking that having some kind of required implementation identifier would be really helpful to deal with these kinds of namespacing concerns (and I have a bunch in the FHIR module).

mks-d commented

Like @mseaton I'd be a bit hesitant to apply something like that to an existing database and probably not as a default.

I'm curious to know of practical cases that would forbid this, @mseaton @ibacher could you share some?

I'm curious to know of practical cases that would forbid this, @mseaton @ibacher could you share some?

The sync module is the place where I'd most be concerned about this, but that same issue may present elsewhere - namely that concept names are identified by uuids in some other serialized data representation or integration, and changing them under the hood may impact these tools. That said, anyone already using Iniz for managing concepts is already in a situation where concept names uuid may change under the hood on them at any time, as they are recreated with random uuids anytime the concept is edited, so for those implementations this likely is fine.

So, I actually implemented basically this same idea in the OCL module, though for a few limited cases.

This is great. I'm not opposed to generating UUIDs per se, just that we design out the algorithm to be as solid as possible and ensure uniqueness.

If we limit this to FSNs (and that's the intent), the pair name as string + locale should be a unique seed no?

Per the above comments, I just think if we're going to go to the trouble to implement this, we should not assume it will just be used for FSNs, but we should make it robust enough to use for any concept names that exist in the dictionary.

Unfortunately the problem with that is that it makes it impossible to change any details about the concept name after it is created, and retain the same generated uuid. So it means that changing a concept from a FSN to a Synonym, or changing the locale (to, say "en_US" from "en") would result in a different uuid, rather than an edit to an existing uuid. I assume this is why you have the suggestion that you have here to use just the concept uuid in the seed. Hmm..

mks-d commented

Unfortunately the problem with that is that it makes it impossible to change any details about the concept name after it is created, and retain the same generated uuid.

@mseaton is this necessarily a problem, or it's just something that needs to be known about how concept name UUIDs are handled under the hood by Iniz when their generation is delegated to Iniz?

What I mean is that when #136 is completed there will be two situations:

  1. Either the concept name's UUID is actively set by implementors using the appropriate new *:UUID-suffixed column (eg. by using a header such as "Fully specified name:en:UUID");
  2. Or setting the concept name's UUID is left to the new potential default process that we are discussing on this thread.

In the latter case, implementors should live with the fact that modifying characteristics of the concept name will result in the generation of a new name with a new UUID. It feels like a edge-ish case anyway, because if the name value itself is changed or if the name type is changed then personally I think it deserves to be a new concept name with a new UUID altogether. I agree that changing the locale is more tricky, but that is really quite a edge case IMO no?

@mks-d - fair enough. So what would you do with the old Concept Name if it has been in use - void it?

mks-d commented

@mseaton yes, especially if it is a FSN since there can only be one anyway. Before we wonder what to do if it is not an FSN, let's decide whether the scope of this issue remains indeed limited to FSNs? I would say yes, for synonyms and short names, appropriate setting and management of their UUIDs would be handled via the new columns introduced through #136.

@mks-d I can understand that your current usage is limited to FSNs, but I'm not sure I see how you won't have the exact same problem with, e.g., synonyms since the purpose of the CodedOrFreeText class is to let us tie the diagnosis to the exact name that was used to record it.

mks-d commented

@ibacher I guess I'm trying to say that concept name UUIDs should be properly taken care of by using the features that #136 will introduce. That's the recommended good practise. But then, for those who overlooked this, this ticket comes and saves the day for FSNs, but that's rather a backup plan in waiting for #136.

Having said that I'm ok to expand the scope to all name types.

My million dollar question is in regards to the Liquibase changeset that would accompany the code change and that will go and correct the UUIDs of a bunch of existing names for which the pattern is not yet fulfilled. Are we ok with that?

No, because not everyone uses Iniz to manage their concepts, and we don't want those who upgrade Iniz to have their concept names changed under them.

What about an approach that:

  1. Updating the code so that if a FSN is not specified with a UUID, generate the uuid using the algorithm proposed, and then
  2. Create a custom liquibase changeset that simply deletes the existing checksums under the Concepts domain.

This will force all Concepts to be reloaded that are managed via CSVs, and will ensure that their names are recreated using the new algorithm.

@mks-d ?

mks-d commented

@mseaton brilliant idea, thanks.

mks-d commented

Ok this is good to go, and this will apply to both FSNs and short names then. Therefore the seeding will have to be a little different. For example for FSNs:

ConceptNameType type = ConceptNameType.FULLY_SPECIFIED;
String seed = concept.getUuid() + "_" + name + "_" + type + "_" + locale.toString();
conceptName.setUuid(UUID.nameUUIDFromBytes(seed.getBytes()).toString());

@Ruhanga - this issue should be closed, correct? This has been completed and merged? @mks-d ?

Yes indeed @mseaton.