zephyr-data-specs/GMNS

๐Ÿ› Need to clarify if `required` means a required column OR also required valid data within column

Opened this issue ยท 3 comments

e-lo commented

As an implementer of GMNS, I'd like to understand if the required constraint applies to values or columns (or both)

Frictionless spec is really just checking for the column presence and allows for missing values.
If we don't want missing values, then we need to assert pattern or enum or other constraints.

Our intent was to use the term required to require column presence and prohibit missing values. I think this lines up with the definition used by Frictionless Table Schema's constraints:

Property Type Applies to Description
required boolean All Indicates whether this field cannot be null. If required is false (the default), then null is allowed. See the section on missingValues for how, in the physical representation of the data, strings can represent null values.

Is the issue actually in the frictionless python package's implementation of the term?

@dtemkin-volpe flagging for your work with frictionless. I know the python package has been updated since I made this comment last year, has it changed what it means by "required"?

From what I can tell, "required" just means that the field can't be null, and "missingValues" is an array of values that when processed by the frictionless python package, equate to null. For example, we list an empty string as a "missingValue" in the node table, and you can see in the cambridge_intersection.sqlite file that these are represented as null values and not as empty strings. So, if required is true and "" is in missingValues, then "" can't be a possible value in that entry (or at the very least, it'll raise an error when we try to create an SQLite database).

image