luckinet/ontologics

dataset

Opened this issue · 4 comments

I came across your very interesting package when I was researching your topic for a part in a newly developed package described

The dataset package is in its infancy, but it wants to create a solution to working with statistical data (for example, downloaded from the Eurostat data warehouse with the eurostat package) in a way that it retains all standard metadata. My package extends a data.frame, tibble, data.frame or simliar object with attributes that can be translated to easily to RDF metadata, or added to an RDF description of a data frame, which includes the contents of the data frame, its structure, valid ranges, and as much semantic information as possible about each column and each row.

The problem overlaps with your package's solutions in sevaral ways:

  • A dataset should be described with a datacite:Subject, which is ideally a SKOS or XKOS term (XKOS is the statistical modification of SKOS).
  • A dataset's columns, whenever possible, should follow the domain-agnostic ontologies of the Statistical Data and Metadata eXchange.
  • A future-proof dataset should be convertible to RDF.

So, while my package is not working with ontologies, it wants to be able to use ontologies, and I would like to investigate its usefulness for the purpose. The reason why I brought this up as an issue, because if there is a match and synergy, I think that connecting dataset and ontologics could boost the utility of both packages.

Hi Daniel,

I did check out some of the material you shared online (I am currently reading the working paper). It all looks very promising, and I will surely follow the development! I have a (shabby) workaround currently in the arealDB package, where I store the meta-data relevant for the particular use-case in "inventory tables" (CSV files on the hard disc). So, any software solution allowing me to pass metadata around in my pipeline would benefit me.

You may have seen that we have an export_as_rdf() function, but I have to admit that @rue-a is the expert on this, so if you have any questions about it, you should address them. However, if you'd like to use the ontology, you're welcome to do so. I am still determining how many synergies there would be for now, but if you have concrete ideas, please email me. Let's talk in person next year to figure things out.

For now, I can say (and perhaps you see it the same way), we have plenty of standards and suggestions on how to use metadata, but there is still a lack of seeing things applied. Scientists who build big data projects very reluctantly look at this stuff; it's intimidating, you have to learn new tools, etc. So, "nobody" does it. I am working on this issue, also still actively, by providing easy-to-use R-packages that abstract much of the complicated stuff away and explain the basic principles in simple terms. I am just writing this because it would be great to join forces, even if only informally, to push this effort further!

rue-a commented

Hi Daniel,

I just took a look on the discussion on rOpenSci and on https://dataset.dataobservatory.eu/. I really like what you did and would like to join the proposed informal meeting next year. Also, some things came to my mind as I read through the work:

P.S.: I also think the name "dataset" is a little confusing :)

Dear @EhrmannS and @rue-a ,

@rue-a Indeed, that was my plan in a soonish relase. I would like to do it in a way that allows the creation of datasets that confirm with W3C Data cube and DCAT-AP. So basicallyl I would like to export into either NQuad/Turtle or CSVW. So basically I want to make sure that https://w3c.github.io/csvw/primer/#example-11 also complies with SDMX standards, so that it does not need to rewritten. issue 15

I had considered connecting to zenR for Zenodo, but it is a package that may be a bit intimidating for most R users because of the use of the R6 class system. Instead, I would like to create a connecting package that basically allows to publish in the format required by the EU Open Data Portal.

As for frictionless, probably it would be good to try to get a grant for that; that is an overkill for me ,but of course, if somebody wants to contribute...

I'll post a doodle in early next year for a meetup! Thank you very much for your comments!

And here is the Turtle support, a bit rudimentary, but I think that it is getting there. From dataset To RDF, in the 0.3.0 version that is on CRAN.