NCEAS/metadig-checks

clean up versioning method of checks and suites

jeanetteclark opened this issue · 5 comments

Description

Currently, checks and suites are versioned using both the filename and id of the check/suite. I propose we only version using the id field using major/minor/patch versioning semantics, and the filename will only contain the check name

If a check is updated, all suites should use the updated version of the check. Any user that wants to use older check versions would be able to do so using git.

With your approval @mbjones I'm going to move forward with this and clean up the few duplicate versioned checks in the metadig-checks repo, and update versions of checks that have changed with the python migration.

In general, I like this approach of semantic versioning by ID.

If a check is updated, all suites should use the updated version of the check.

I'm pretty sure we already have operational suites using different versions of the same check. ESS-DIVE's suite comes to mind, for example. Before you make this decision, please compare the current suites in use and make sure there is a mechanism for them to all stay operational with the version of the check they are each using. I would express the requirement as:

  • Requirement: each suite can independently specify use of any version of any released check at runtime, without reinstalling the software or a new set of checks

I think in order to meet that requirement we will have to include each version of the check as a separate file within the metadig-checks repo. What is the intention behind having each suite being able to run any version of a check at runtime? Why would we want someone running an outdated check? Some past versions of checks are plain wrong or don't work, we updated them for a reason. Many of the python checks have been upgraded to not use the unicode type since in python 3 everything is assumed to be unicode and the type was deprecated. If someone is running old versions of python checks on the new system they are going to have a bad time.

FWIW, currently that requirement is not satisfied since there have been many changes to checks that have gone un-versioned, in addition to files being renamed, versions updated etc, in such a way that old versions wouldn't be accessible beyond installing a different version of the checks repo

I think the reason they are used as multiple versions as different groups prefer one check behavior over another. I think that is why ESS-DIVE used older check versions, as they didn't like the "fix". One could argue that makes them different checks, not in the same version chain, and so they should simply be named differently (although still very similarly).

okay I actually agree with that - they become different (but very similar checks). I do think that is what we have currently, two similarly named checks that use slightly different xpath logic on what they are checking.

so my path forward for the python upgrade checks will be to bump the version but not keep the previous versions (git will have them, obviously).

I made two checks where the check logic changed - one is to make the NSF awards checks compatible with EML 2.2.0. I think this merits a version bump and not a new check. The second was to add two http response codes to the URL resolvable check. I'm not sure what to do about that one

Note that existing checks (linked from current check reports that are archived) are accessible via the API, and should continue to be accessible with their current identifiers so that historical check results that we have stored can be interpreted/understood. The API is meant to be immutable IIRC.

I just cc'ed you on an email about this.