NASA-PDS/registry-mgr

Missing Science_Facets fields definitions in registry schema

Closed this issue · 1 comments

Describe the bug
Identified by @rchenatjpl

In the bundle.xml, I commented out (marked by <!--RC) three parts that caused registry-manager load-data to fail. For the first two, I got:
% registry-manager load-data -file /tmp/harvOut
Elasticsearch URL: http://localhost:9200
Index: registry
Updating schema with fields from /tmp/harvOut/fields.txt
[ERROR] Could not find datatype for field 'pds/Science_Facets/pds/facet1'
I don't know if that's something the user should fix or if it's a software bug.

The third touchy part is for is_facility and is_telescope.

Is this a bug, or is the user supposed to modify registry-manager/elastic/registry.json (or some other file)?
% registry-manager load-data -file /tmp/harvOut Elasticsearch URL: http://localhost:9200
Index: registry
Updating schema with fields from /tmp/harvOut/fields.txt
[ERROR] Could not find datatype for field 'ref_lid_facility'

From @rchenatjpl:

I don't fully understand what this means, but maybe it's causing a disconnect between the .xsd and the .json. https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1E00.xsd says:
<xs:complexType name="Science_Facets">
xs:annotation
xs:documentation
The Science_Facets class contains the science-related search facets. It is optional and may be repeated if an product has facets related to, for example, two different disciplines (as defined by the discipline_name facet). Note that Science_Facets was modeled with Discipline_Facets as a component and Discipline_Facets was modeled with Group_Facet1 and Group_Facet2 as components. This dependency hierarchy was flattened and only Science_Facets exists in the schema.
</xs:documentation>
</xs:annotation>
xs:sequence
<xs:element name="wavelength_range" nillable="true" type="pds:wavelength_range" minOccurs="0" maxOccurs="unbounded"> </xs:element>
<xs:element name="domain" type="pds:domain" minOccurs="0" maxOccurs="unbounded"> </xs:element>
<xs:element name="discipline_name" type="pds:ASCII_Short_String_Collapsed" minOccurs="1" maxOccurs="1"> </xs:element>
<xs:element name="facet1" type="pds:ASCII_Short_String_Collapsed" minOccurs="0" maxOccurs="1"> </xs:element>
...
while searching for "facet1" in https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_JSON_1E00.JSON points to something within Group_Facet1 not Science_Facets

From @tdddblog:

We are trying to understand how to find out from “PDS4_PDS_JSON_1E00.JSON” file that "0001_NASA_PDS_1.pds.Science_Facets" class has “pds.facet1” attribute.

This is a description from "0001_NASA_PDS_1.pds.Science_Facets" class:
"The Science_Facets class contains the science-related search facets. It is optional and may be repeated if an product has facets related to, for example, two different disciplines (as defined by the discipline_name facet). Note that Science_Facets was modeled with Discipline_Facets as a component and Discipline_Facets was modeled with Group_Facet1 and Group_Facet2 as components. This dependency hierarchy was flattened and only Science_Facets exists in the schema."

I don’t think this “flattening” is reflected anywhere in “PDS4_PDS_JSON_1E00.JSON” file.

There is another class which is not flattened, "0001_NASA_PDS_1.pds.Primary_Result_Summary". It contains "0001_NASA_PDS_1.pds.Science_Facets" as a component.

How to tell that "0001_NASA_PDS_1.pds.Science_Facets" should be flattened, but "0001_NASA_PDS_1.pds.Primary_Result_Summary" should not.

from @jshughes:

You are correct, the “flattening” is not recorded in the IM.

The proposer of the <Primary_Result_Summary> class initially submitted a flat model for the PDS4 XML label. However she also added dependency requirements resulting in a hierarchical model. To handle the anomaly, I added special code to IMTool/LDDTool that flattens the model for the XML serialization. It has been a problem area ever since and I would not let it happen again.

We could probably submit a change request to add indicators that this one structure is being flattened. However I am not sure that this would help you that much.

There might be one alternative that we could look into, but it would definitely required a change request. The class <Primary_Result_Summary> could be flattened in the IM and the dependency requirements implemented as Schematron rules. Again, I am not sure if this would actually help you.

From @tdddblog:

For now we can create a custom CSV file and load it. I think only 5 fields need special handling:

pds/Science_Facets/pds/discipline_name
pds/Science_Facets/pds/facet1
pds/Science_Facets/pds/subfacet1
pds/Science_Facets/pds/facet2
pds/Science_Facets/pds/subfacet2