geosolutions-it/geonode-rndt

Parse and back-map values from XML document

Closed this issue · 2 comments

etj commented

Investigate about which elements are read from an uploaded XML metadata document and set into the geonode model.
Check whether more fields can be imported.
Check for possible improvements within geonode, and if parsing can be refined by other django apps

etj commented

Current parsing is performed here:
https://github.com/GeoNode/geonode/blob/3.1/geonode/layers/metadata.py#L72-L117

Parsed fields:

  • metadata language (not data language)
  • hierarchy
  • metadata datestamp
  • title
  • abstract
  • purpose
  • supplemental information
  • temporal extent
  • topic category
  • keywords (thesauri are not parsed)
  • othercostraints (very naive parsing)
  • purpose
  • lineage
etj commented
  • Remove existing keyword mapping and create a new keyword mapping by parsing keywords again.

    • Thesauri
      Default geonode parser will compare Thesaurus.title against //gmd:thesaurusName/gmd:CI_Citation/gmd:title/gco:CharacterString/text().
      RNDT (applying INSPIRE TG2) metadata may have the thesaurus title expressed as //gmd:thesaurusName/gmd:CI_Citation/gmd:title/gmx:Anchor, so the title xpath may not match.
      Since RNDT assumes that matching is performed through the Thesaurus href and not the title, we'll check:
      • if the Thesaurus title is a CharacterString, we'll use that title
      • if the Thesaurus title is an Anchor, search for a thesaurus so that Thesaurus.about == gmx:Anchor/@href
        • if such a thesaurus exists, create a thesaurus dict having title as the title of the thesaurus in the db, so that when saving the dataset's keywords the thesaurus will be matched.
        • else, use as title gmx:Anchor/text()
    • Keywords
      For the keywords the same considerations as for the thesauri apply.
      Default geonode parser will compare ThesaurusKeyword.alt_label or ThesaurusKeywordLabel.label against gmd:MD_Keywords/gmd:keyword/gco:CharacterString/text()
      RNDT metadata may have the keyword expressed as //gmd:MD_Keywords/gmd:keyword/gmx:Anchor, so the keyword text xpath may not match.
      Since RNDT assumes that matching is performed through the Keyword href and not the text():
      • if the Keyword is a CharacterString, we'll use that text()
      • if the Keyword is an Anchor, search for a keyword in the declared thesaurus so that ThesaurusKeyword.about == gmx:Anchor/@href
        • if such a keyword exists, add an entry to the given thesaurus dict setting the alt_label of the Keyword found in the DB.
        • else, discard the keyword and log an error (or save it as a free keyword???)
  • Add parsing for access costraints --> custom val
    We have to check for a block like that:

    <gmd:resourceConstraints>
     <gmd:MD_LegalConstraints>
        <gmd:accessConstraints>
           <gmd:MD_RestrictionCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_RestrictionCode"
                                   codeListValue="otherRestrictions"/>
        </gmd:accessConstraints>
        <gmd:otherConstraints>
           <gmx:Anchor xlink:href="http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations">No limitations to public access</gmx:Anchor>
        </gmd:otherConstraints>
     </gmd:MD_LegalConstraints>
    </gmd:resourceConstraints>

    Take the element in xpath:

    gmd:resourceConstraints/gmd:MD_LegalConstraints[gmd:accessConstraints/gmd:MD_RestrictionCodecodeListValue="otherRestrictions"]/gmd:otherConstraints
    
    • If such element contains a gmx:Anchor, then the format is RNDT compliant:
      • take gmx:Anchor/@xlink:href
      • if the URI exists in the related Thesaurus LimitationsOnPublicAccess, store its value in val['constraints_other'];
      • else store the value http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
    • else gco:CharacterString should exist
      • store its value in a local var, we'll use it back in useConstraints.
  • add parsing for use costraints --> custom rndt
    Check for a block:

     <gmd:resourceConstraints>
        <gmd:MD_LegalConstraints>
           ...
           <gmd:useConstraints>
              <gmd:MD_RestrictionCode 
                 codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_RestrictionCode"
                 codeListValue="otherRestrictions">altri vincoli</gmd:MD_RestrictionCode>
           </gmd:useConstraints>
           <gmd:otherConstraints>
              <gmx:Anchor
                 xlink:href="http://inspire.ec.europa.eu/metadata-codelist/ConditionsApplyingToAccessAndUse/noConditionsApply">Nessuna condizione applicabile</gmx:Anchor>
           </gmd:otherConstraints>
        </gmd:MD_LegalConstraints>
     </gmd:resourceConstraints>

    Take the element in xpath:

    gmd:resourceConstraints/gmd:MD_LegalConstraints[gmd:useConstraints/gmd:MD_RestrictionCodecodeListValue="otherRestrictions"]/gmd:otherConstraints
    
    • If such element contains a gmx:Anchor
      • take gmx:Anchor/@xlink:href
      • if the URI exists in the related Thesaurus ConditionsApplyingToAccessAndUse, store its value in custom {"rndt": {'constraints_other': value}}
      • else store its text() + the textr value extracted in "access costraints" if any in custom {"rndt": {'constraints_other': value}}
    • else gco:CharacterString should exist
      • store its text() + the text value extracted in "access costraints" if any in custom {"rndt": {'constraints_other': value}}
  • add parsing for resolutions --> custom rndt

  • add parsing for accuracy --> custom rndt

For other info about constraints also see #59.