Issues identified in Semantic Assets during initial NDC harvesting
Closed this issue Β· 28 comments
- Unable to extract node summary from resource 'https://w3id.org/italia/controlled-vocabulary/theme-subtheme-mapping' using 'http://purl.org/dc/terms/rightsHolder'
- InvalidModelException: Unable to extract node summary from resource 'http://dati.gov.it/onto/dcatapit' using 'http://purl.org/dc/terms/rightsHolder'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/PublicContract'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/CulturalHeritage'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/Indicator'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/PARK'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/Language'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/RO'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/IoT'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/Project'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/POI'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/POT'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/HER'
- Caused by: org.apache.jena.rdf.model.LiteralRequiredException: https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/Ontologie/AtlasOfPaths/v0.1/AtlasOfPaths-AP_IT.png check here for more details
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/CPV'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/CPSV'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/l0'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/ACCO'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/AccessCondition'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/COV'
- Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit/' using 'http://purl.org/dc/terms/rightsHolder'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/ADMS'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/CLV'
- Cannot find node 'http://www.w3.org/ns/dcat#distribution' for resource 'https://w3id.org/italia/onto/CPEV'
- InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here
- Invalid xsd date format in /classifications-for-universities/academic-disciplines/academic-disciplines.ttl
@giorgialodi @ioggstream @spuliz FYI & A, we need to course correct above data.
@SrinivasanTarget not sure that dcat:distribution
applies to ontologies, since:
dcat:distribution
dfs:range dcat:Distribution ;
skos:definition "An available distribution of the dataset."@en
.
dcat:Distribution
rdfs:comment "A specific representation of a dataset..."@en ;
@ioggstream it applies do not worry :) It is like that according to the metadata ontology ADMS we use in OntoPiA (which is in turn, according to ADMS-AP, based on DCAT-AP).
Dear @SrinivasanTarget @ioggstream I reconstructed using ADMS the modelling. In essence,
- an Ontology is a SemanticAsset that is also in turn a Dataset according to DCAT-AP_IT.
- A SemanticAsset has a distribution.
- To define point 2, in the ontology we introduced a property that says :SemantiAsset :hasSemanticAssetDistribution :SemanticAssetDistribution,
- the SemanticAssetDistribution is a Distribution according to DCAT-AP_IT
- according to points above, since a SemanticAsset is a Dataset and a SemanticAssetDistribution is a Distribution we can say that :hasSemanticAssetDistribution is a specialization of dcat:distribution.
In essence, from a semantic perspective there are no issues for the ontologies. From a technical perspective you have to take into account hasSemanticAssetDistribution instead of directly dcat:distribution.
The other points I think are typos and we need to correct them. Thanks for spotting them out!
@giorgialodi Thanks for the clarification. your suggestions are incorporated now. Summarising below are the list of issues we still need to fix:
- Unable to extract node summary from resource 'https://w3id.org/italia/controlled-vocabulary/theme-subtheme-mapping' using 'http://purl.org/dc/terms/rightsHolder'
- InvalidModelException: Unable to extract node summary from resource 'http://dati.gov.it/onto/dcatapit' using 'http://purl.org/dc/terms/rightsHolder'
- Caused by: org.apache.jena.rdf.model.LiteralRequiredException: https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/Ontologie/AtlasOfPaths/v0.1/AtlasOfPaths-AP_IT.png check here for more details
- InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here
- Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit/' using 'http://purl.org/dc/terms/rightsHolder'
- Invalid xsd date format in /classifications-for-universities/academic-disciplines/academic-disciplines.ttl
- Instead of placing the deprecated foldersβs like this daf-ontologie-vocabolari-controllati/VocabolariControllati/vocs-deprecated can we organise folder structure like below?
./daf-ontologie-vocabolari-controllati/VocabolariControllati/ - contains all valid CV's only
./daf-ontologie-vocabolari-controllati/vocs-deprecated/ or daf-ontologie-vocabolari-controllati/deprecated/ - contains deprecated assets
- <Several views as CSV's for Single Semantic Asset>
@ioggstream @giorgialodi do we have any date on fixing the above issues in repo?
@SrinivasanTarget working on it
@SrinivasanTarget this "Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit/' using 'http://purl.org/dc/terms/rightsHolder' where does it originate?
@SrinivasanTarget this InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here --> what's the problem? I validated the file and it seems correct.
@SrinivasanTarget @ioggstream @spuliz @mfortini I fixed here some issues and PRs must be merged. In particular the following ones:
- Unable to extract node summary from resource 'https://w3id.org/italia/controlled-vocabulary/theme-subtheme-mapping' using 'http://purl.org/dc/terms/rightsHolder'
- InvalidModelException: Unable to extract node summary from resource 'http://dati.gov.it/onto/dcatapit' using 'http://purl.org/dc/terms/rightsHolder'
- Invalid xsd date format in /classifications-for-universities/academic-disciplines/academic-disciplines.ttl
For the following ones:
- InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here
- Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit/' using 'http://purl.org/dc/terms/rightsHolder'
I need more details.
Finally, Caused by: org.apache.jena.rdf.model.LiteralRequiredException: https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/Ontologie/AtlasOfPaths/v0.1/AtlasOfPaths-AP_IT.png check here for more details --> for me this is fine not an error since dct:description can be used with non literal values. Can you skip the value of this property?
Finally, Caused by: org.apache.jena.rdf.model.LiteralRequiredException: master/Ontologie/AtlasOfPaths/v0.1/AtlasOfPaths-AP_IT.png (raw) check here for more details --> for me this is fine not an error since dct:description can be used with non literal values. Can you skip the value of this property?
@giorgialodi this is fixed at our end today.
InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here
Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit' using 'http://purl.org/dc/terms/rightsHolder'
@giorgialodi these issues seem to be fixed too.
@giorgialodi Is it a valid semantic asset?
Also for cities
, we are getting Caused by: org.apache.jena.shared.PropertyNotFoundException: http://purl.org/dc/terms/identifier
?
@giorgialodi Is it a valid semantic asset?
No that is an example of data. It is data not controlled-vocabulary or ontology
Also for
cities
, we are gettingCaused by: org.apache.jena.shared.PropertyNotFoundException: http://purl.org/dc/terms/identifier
?
I will check, but this vocabulary was produced automatically
@SrinivasanTarget are you sure about cities? Because in that vocabulary I see that property http://purl.org/dc/terms/identifier "https://w3id.org/italia/controlled-vocabulary/territorial-classifications/cities" ;
<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;
........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
<http://purl.org/dc/terms/identifier> "ISTAT" ;
<http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .
rightsHolder
URL seem to be wrong.
@giorgialodi Since we are processing assets only at leaf level folders. below folders are not allowing us to process the actual assets.
VocabolariControllati/territorial-classifications/provinces/scriptR2RML
VocabolariControllati/territorial-classifications/geographical-distribution/scriptR2RML
VocabolariControllati/territorial-classifications/cities/scriptR2RML
Can we please remove it? thoughts please.
<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ; ........ <https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ; <http://purl.org/dc/terms/identifier> "ISTAT" ; <http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .
rightsHolder
URL seem to be wrong.
You are right @SrinivasanTarget where is it? In which voc or ontolog?
<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ; ........ <https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ; <http://purl.org/dc/terms/identifier> "ISTAT" ; <http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .
rightsHolder
URL seem to be wrong.You are right @SrinivasanTarget where is it? In which voc or ontolog?
In CV.
@giorgialodi Since we are processing assets only at leaf level folders. below folders are not allowing us to process the actual assets.
VocabolariControllati/territorial-classifications/provinces/scriptR2RML VocabolariControllati/territorial-classifications/geographical-distribution/scriptR2RML VocabolariControllati/territorial-classifications/cities/scriptR2RML
Can we please remove it? thoughts please.
Hmmm, remove it is really a shame. Those are scripts for transforming that data into the RDF vocabulary that is published. it is important knowledge to provide to end-users. Can't you skip them?
<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ; ........ <https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ; <http://purl.org/dc/terms/identifier> "ISTAT" ; <http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .
rightsHolder
URL seem to be wrong.You are right @SrinivasanTarget where is it? In which voc or ontolog?
In CV.
Do you remember which one? So I can correct it directly?
Hmmm, remove it is really a shame. Those are scripts for transforming that data into the RDF vocabulary that is published. it is important knowledge to provide to end-users. Can't you skip them?
@giorgialodi what's your suggestion on how to skip? Is it folder with name scriptR2RML
need to skipped? any other suggestion on skip rules?
@giorgialodi As changing in algorithm of folder search is time consuming, can we moved all the scriptR2RML
under /VocabolariControllati
? is it possible to rearrange like below?
/VocabolariControllati/scriptR2RML/cities/*.ttl
/VocabolariControllati/scriptR2RML/geographical-distribution/*.ttl
/VocabolariControllati/territorial-classifications/cities/cities.ttl
/VocabolariControllati/territorial-classifications/geographical-distribution/geographical-distribution.ttl
@SrinivasanTarget it is not only that, there is another case in which a CV has a directory that is named sparql. It is not very clear for users of this repo to do these movements from my point of view; however, since the priority is NDC and the files within this scriptR2RML have the name of the vocs, I think we can do it. Not sure about the case of sparql directory. I am going to check. I am also thinking how to deal with the CV that has more than one turtle file.
@SrinivasanTarget @ioggstream I have the following proposals:
- sparql directory in transparency-titulus is moved to the root
- scriptR2RML in the three territorial classifications is moved to the root
- transparency-obligation has more than one voc in the directory. Let's create another directory transparency-obligation-organisation and move the second turtle and all the rest in this directory
If it's ok for Giorgia, it is for me. Probably shaping the repo on the temporary algorithm is suboptimal, but we're all committed to achieving the goal :)
@simonetw how many of the above issues still stand? Can you mark the fixed issues pls?
I think this has been fixed. I will close the issue