italia/daf-ontologie-vocabolari-controllati

Issues identified in Semantic Assets during initial NDC harvesting

Closed this issue Β· 28 comments

@giorgialodi @ioggstream @spuliz FYI & A, we need to course correct above data.

@SrinivasanTarget not sure that dcat:distribution applies to ontologies, since:

dcat:distribution
  dfs:range dcat:Distribution ;
  skos:definition "An available distribution of the dataset."@en
.

dcat:Distribution 
    rdfs:comment "A specific representation of a dataset..."@en ;

@ioggstream it applies do not worry :) It is like that according to the metadata ontology ADMS we use in OntoPiA (which is in turn, according to ADMS-AP, based on DCAT-AP).

Dear @SrinivasanTarget @ioggstream I reconstructed using ADMS the modelling. In essence,

  1. an Ontology is a SemanticAsset that is also in turn a Dataset according to DCAT-AP_IT.
  2. A SemanticAsset has a distribution.
  3. To define point 2, in the ontology we introduced a property that says :SemantiAsset :hasSemanticAssetDistribution :SemanticAssetDistribution,
  4. the SemanticAssetDistribution is a Distribution according to DCAT-AP_IT
  5. according to points above, since a SemanticAsset is a Dataset and a SemanticAssetDistribution is a Distribution we can say that :hasSemanticAssetDistribution is a specialization of dcat:distribution.

In essence, from a semantic perspective there are no issues for the ontologies. From a technical perspective you have to take into account hasSemanticAssetDistribution instead of directly dcat:distribution.

The other points I think are typos and we need to correct them. Thanks for spotting them out!

@giorgialodi Thanks for the clarification. your suggestions are incorporated now. Summarising below are the list of issues we still need to fix:

./daf-ontologie-vocabolari-controllati/VocabolariControllati/ - contains all valid CV's only

./daf-ontologie-vocabolari-controllati/vocs-deprecated/ or daf-ontologie-vocabolari-controllati/deprecated/ - contains deprecated assets
  • <Several views as CSV's for Single Semantic Asset>

@ioggstream @giorgialodi do we have any date on fixing the above issues in repo?

@SrinivasanTarget this "Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit/' using 'http://purl.org/dc/terms/rightsHolder' where does it originate?

@SrinivasanTarget this InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here --> what's the problem? I validated the file and it seems correct.

@SrinivasanTarget @ioggstream @spuliz @mfortini I fixed here some issues and PRs must be merged. In particular the following ones:

  1. Unable to extract node summary from resource 'https://w3id.org/italia/controlled-vocabulary/theme-subtheme-mapping' using 'http://purl.org/dc/terms/rightsHolder'
  2. InvalidModelException: Unable to extract node summary from resource 'http://dati.gov.it/onto/dcatapit' using 'http://purl.org/dc/terms/rightsHolder'
  3. Invalid xsd date format in /classifications-for-universities/academic-disciplines/academic-disciplines.ttl

For the following ones:

  1. InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here
  2. Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit/' using 'http://purl.org/dc/terms/rightsHolder'

I need more details.

Finally, Caused by: org.apache.jena.rdf.model.LiteralRequiredException: https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/Ontologie/AtlasOfPaths/v0.1/AtlasOfPaths-AP_IT.png check here for more details --> for me this is fine not an error since dct:description can be used with non literal values. Can you skip the value of this property?

Finally, Caused by: org.apache.jena.rdf.model.LiteralRequiredException: master/Ontologie/AtlasOfPaths/v0.1/AtlasOfPaths-AP_IT.png (raw) check here for more details --> for me this is fine not an error since dct:description can be used with non literal values. Can you skip the value of this property?

@giorgialodi this is fixed at our end today.

InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here
Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit' using 'http://purl.org/dc/terms/rightsHolder'

@giorgialodi these issues seem to be fixed too.

@giorgialodi Is it a valid semantic asset?

Also for cities, we are getting Caused by: org.apache.jena.shared.PropertyNotFoundException: http://purl.org/dc/terms/identifier?

@giorgialodi Is it a valid semantic asset?

No that is an example of data. It is data not controlled-vocabulary or ontology

Also for cities, we are getting Caused by: org.apache.jena.shared.PropertyNotFoundException: http://purl.org/dc/terms/identifier?

I will check, but this vocabulary was produced automatically

@giorgialodi

<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;

........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
	<http://purl.org/dc/terms/identifier> "ISTAT" ;
	<http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .

rightsHolder URL seem to be wrong.

@giorgialodi Since we are processing assets only at leaf level folders. below folders are not allowing us to process the actual assets.

VocabolariControllati/territorial-classifications/provinces/scriptR2RML
VocabolariControllati/territorial-classifications/geographical-distribution/scriptR2RML
VocabolariControllati/territorial-classifications/cities/scriptR2RML

Can we please remove it? thoughts please.

@giorgialodi

<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;

........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
	<http://purl.org/dc/terms/identifier> "ISTAT" ;
	<http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .

rightsHolder URL seem to be wrong.

You are right @SrinivasanTarget where is it? In which voc or ontolog?

@giorgialodi

<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;

........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
	<http://purl.org/dc/terms/identifier> "ISTAT" ;
	<http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .

rightsHolder URL seem to be wrong.

You are right @SrinivasanTarget where is it? In which voc or ontolog?

In CV.

@giorgialodi Since we are processing assets only at leaf level folders. below folders are not allowing us to process the actual assets.

VocabolariControllati/territorial-classifications/provinces/scriptR2RML
VocabolariControllati/territorial-classifications/geographical-distribution/scriptR2RML
VocabolariControllati/territorial-classifications/cities/scriptR2RML

Can we please remove it? thoughts please.

Hmmm, remove it is really a shame. Those are scripts for transforming that data into the RDF vocabulary that is published. it is important knowledge to provide to end-users. Can't you skip them?

@giorgialodi

<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;

........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
	<http://purl.org/dc/terms/identifier> "ISTAT" ;
	<http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .

rightsHolder URL seem to be wrong.

You are right @SrinivasanTarget where is it? In which voc or ontolog?

In CV.

Do you remember which one? So I can correct it directly?

Hmmm, remove it is really a shame. Those are scripts for transforming that data into the RDF vocabulary that is published. it is important knowledge to provide to end-users. Can't you skip them?

@giorgialodi what's your suggestion on how to skip? Is it folder with name scriptR2RML need to skipped? any other suggestion on skip rules?

@giorgialodi As changing in algorithm of folder search is time consuming, can we moved all the scriptR2RML under /VocabolariControllati ? is it possible to rearrange like below?

/VocabolariControllati/scriptR2RML/cities/*.ttl
/VocabolariControllati/scriptR2RML/geographical-distribution/*.ttl

/VocabolariControllati/territorial-classifications/cities/cities.ttl
/VocabolariControllati/territorial-classifications/geographical-distribution/geographical-distribution.ttl

@SrinivasanTarget it is not only that, there is another case in which a CV has a directory that is named sparql. It is not very clear for users of this repo to do these movements from my point of view; however, since the priority is NDC and the files within this scriptR2RML have the name of the vocs, I think we can do it. Not sure about the case of sparql directory. I am going to check. I am also thinking how to deal with the CV that has more than one turtle file.

@SrinivasanTarget @ioggstream I have the following proposals:

  1. sparql directory in transparency-titulus is moved to the root
  2. scriptR2RML in the three territorial classifications is moved to the root
  3. transparency-obligation has more than one voc in the directory. Let's create another directory transparency-obligation-organisation and move the second turtle and all the rest in this directory

If it's ok for Giorgia, it is for me. Probably shaping the repo on the temporary algorithm is suboptimal, but we're all committed to achieving the goal :)

@simonetw how many of the above issues still stand? Can you mark the fixed issues pls?

I think this has been fixed. I will close the issue