microbiomedata/DataHarmonizer

Update Elevation help text to include units

Closed this issue · 9 comments

Typing "299" into elevation fails on validation.
Typing "299 m" succeeds.

The help description and guidance do not state that a unit is required. This should be specified in the help.

Note that MIxS as-is columns like this generally require the user to specify units, whereas several MIxS modified column have been created in which a bare value is required, and the inclusion of a unit will fail validation.

This highlights the need to be more consistent with documentation (including the double click help) and validation. I think the idea for the MIxS modified columns was to communicate the required units to the submitter but require numerical entries only.

Since elev is a MIxS as-is column, specified in the mixs_packages_x_slots tab, it inherits all attributes from MIxS, including the description, DH guidance (from LinkML comments), and the required range, QuantityValue, which is converted into the regular expression ^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? \S+$, where the left-hand \S+ means a space and them at least one other character.

Solving this issue will require either

  • moving a static definition, in its entirety, for elev to the exiting mixs_modified_slots tab, where it could get out of sync with the MIxS standard
  • using some combination of the --remove and --add subcommands from linkml-apply
  • developing a new minimal modification method (as part of schemasheets @cmungall ?), in which you can say please make just these N modification to

Adding to the current sprint per the task list from the subport squad. Is this reasonable @turbomam?

I am going to take over this issue.
If I remember correctly, the decision is elevation should always be in meters? Or will we accept ft?

If ONLY meters, we need to make the validation allow for the number only. And update the text to reflect this.
I can update the guidance & example. @turbomam can you update the validation rule?

Unless we want to allow ft? If we do, I'll just update the text. No validation change needed.

Thanks @mslarae13

I think there are several questions to consider for elevation and other measurement-like slots, to maximize consistent slot definitions. Let's try to minimize the number of patterns we use, and write really consistent annotations (comments, description, examples, etc.)

  • does slot X take a numerical value only, or does it require a numerical value and a unit?
  • if units are required, do we allow any units, or only one specified unit?
  • is a range of values allowed?
  • for ranges, what are the allowed delimiters? "-"? " to "?
  • are we anticipating that all expressions of ranges and units should be parse-able by quanulum3? That wasn't looking very promising for the wide range of water content examples.

I will generate a table of Biosample-related slots that illustrates the current state of the criteria above.

Thanks @turbomam !

Should we talk about this on the 3rd? or on the 10th?

@ssarrafan overdue, please add to January 2023 sprint

@mslarae13 moving to the next sprint, let me know if you don't plan to work on it in the next few weeks please

elevation should always be reported in meters. Mark and I met today and update oxy_stat_sample & elev