Parse and back-map values from XML document
Closed this issue · 2 comments
Investigate about which elements are read from an uploaded XML metadata document and set into the geonode model.
Check whether more fields can be imported.
Check for possible improvements within geonode, and if parsing can be refined by other django apps
Current parsing is performed here:
https://github.com/GeoNode/geonode/blob/3.1/geonode/layers/metadata.py#L72-L117
Parsed fields:
- metadata language (not data language)
- hierarchy
- metadata datestamp
- title
- abstract
- purpose
- supplemental information
- temporal extent
- topic category
- keywords (thesauri are not parsed)
- othercostraints (very naive parsing)
- purpose
- lineage
-
Remove existing keyword mapping and create a new keyword mapping by parsing keywords again.
- Thesauri
Default geonode parser will compareThesaurus.title
against//gmd:thesaurusName/gmd:CI_Citation/gmd:title/gco:CharacterString/text()
.
RNDT (applying INSPIRE TG2) metadata may have the thesaurus title expressed as//gmd:thesaurusName/gmd:CI_Citation/gmd:title/gmx:Anchor
, so the title xpath may not match.
Since RNDT assumes that matching is performed through the Thesaurushref
and not the title, we'll check:- if the Thesaurus title is a
CharacterString
, we'll use that title - if the Thesaurus title is an
Anchor
, search for a thesaurus so thatThesaurus.about
==gmx:Anchor/@href
- if such a thesaurus exists, create a thesaurus dict having title as the title of the thesaurus in the db, so that when saving the dataset's keywords the thesaurus will be matched.
- else, use as title
gmx:Anchor/text()
- if the Thesaurus title is a
- Keywords
For the keywords the same considerations as for the thesauri apply.
Default geonode parser will compareThesaurusKeyword.alt_label
orThesaurusKeywordLabel.label
againstgmd:MD_Keywords/gmd:keyword/gco:CharacterString/text()
RNDT metadata may have the keyword expressed as//gmd:MD_Keywords/gmd:keyword/gmx:Anchor
, so the keyword text xpath may not match.
Since RNDT assumes that matching is performed through the Keywordhref
and not thetext()
:- if the Keyword is a
CharacterString
, we'll use thattext()
- if the Keyword is an
Anchor
, search for a keyword in the declared thesaurus so thatThesaurusKeyword.about
==gmx:Anchor/@href
- if such a keyword exists, add an entry to the given thesaurus dict setting the
alt_label
of the Keyword found in the DB. - else, discard the keyword and log an error (or save it as a free keyword???)
- if such a keyword exists, add an entry to the given thesaurus dict setting the
- if the Keyword is a
- Thesauri
-
Add parsing for access costraints --> custom
val
We have to check for a block like that:<gmd:resourceConstraints> <gmd:MD_LegalConstraints> <gmd:accessConstraints> <gmd:MD_RestrictionCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_RestrictionCode" codeListValue="otherRestrictions"/> </gmd:accessConstraints> <gmd:otherConstraints> <gmx:Anchor xlink:href="http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations">No limitations to public access</gmx:Anchor> </gmd:otherConstraints> </gmd:MD_LegalConstraints> </gmd:resourceConstraints>
Take the element in xpath:
gmd:resourceConstraints/gmd:MD_LegalConstraints[gmd:accessConstraints/gmd:MD_RestrictionCodecodeListValue="otherRestrictions"]/gmd:otherConstraints
- If such element contains a
gmx:Anchor
, then the format is RNDT compliant:- take
gmx:Anchor/@xlink:href
- if the URI exists in the related Thesaurus
LimitationsOnPublicAccess
, store its value inval['constraints_other']
; - else store the value
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
- take
- else
gco:CharacterString
should exist- store its value in a local var, we'll use it back in useConstraints.
- If such element contains a
-
add parsing for use costraints --> custom
rndt
Check for a block:<gmd:resourceConstraints> <gmd:MD_LegalConstraints> ... <gmd:useConstraints> <gmd:MD_RestrictionCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_RestrictionCode" codeListValue="otherRestrictions">altri vincoli</gmd:MD_RestrictionCode> </gmd:useConstraints> <gmd:otherConstraints> <gmx:Anchor xlink:href="http://inspire.ec.europa.eu/metadata-codelist/ConditionsApplyingToAccessAndUse/noConditionsApply">Nessuna condizione applicabile</gmx:Anchor> </gmd:otherConstraints> </gmd:MD_LegalConstraints> </gmd:resourceConstraints>
Take the element in xpath:
gmd:resourceConstraints/gmd:MD_LegalConstraints[gmd:useConstraints/gmd:MD_RestrictionCodecodeListValue="otherRestrictions"]/gmd:otherConstraints
- If such element contains a
gmx:Anchor
- take
gmx:Anchor/@xlink:href
- if the URI exists in the related Thesaurus
ConditionsApplyingToAccessAndUse
, store its value incustom {"rndt": {'constraints_other': value}}
- else store its
text()
+ the textr value extracted in "access costraints" if any incustom {"rndt": {'constraints_other': value}}
- take
- else
gco:CharacterString
should exist- store its
text()
+ the text value extracted in "access costraints" if any incustom {"rndt": {'constraints_other': value}}
- store its
- If such element contains a
-
add parsing for resolutions --> custom
rndt
-
add parsing for accuracy --> custom
rndt
For other info about constraints also see #59.