ropensci/dataspice

Odd attributes table behaviour in index.html when description and unitText empty.

annakrystalli opened this issue ยท 9 comments

When attributes.csv description and unitText empty, the index.html attributes table fields name and description are populated with the biblio.csv title and description fields. ๐Ÿค”

Not had a chance to trace it. Will try to but just wanted to flag it.

The error is not generated in the json file as far as I can tell but when parsing it during build_site()

OK, I thnk I've located the issue in the template. There seems to be no name field in variableMeasured objects (sorry not too well versed in .json terminology, hope this makes sense). At the minute, I'm working on the dataspice workshop I'll be running on tuesday and everything seems to work ok when the attributes.csv table is completed apart from the attributes name field is actually still pulling the title from biblio ๐Ÿคทโ€โ™€๏ธ

In all honestly, I'm a little confused myself as to what name and value in the attributes table is supposed to show. Additionally, because sometimes we have duplicate entries of the same variable across different tables (better handling of which we might want to consider for future) I suggest changing what the attributes table presents to match what in fact folks recorded in the attributes.csv as such:

  • file name: to show which file the variable is found in
  • variable name: the variable name
  • description: the description
  • units: the unitText

Does any one have any objections? @amoeba @maelle @aurielfournier @khondula @cboettig If not, do you mind if I just make the change and push to master? (I've made small bug fixes and pushed already, sorry!)

Actually, doing a bit more digging on schema.org, I don't think above suggestion will be straight forward:

  1. I can't find an obvious property for fileName so ignore that suggestion for now. I do however suggest running distinct() on the attributes table after fileName has been removed during write_spice().

  2. Looking into the definition of value:

The value of the quantitative value or property value node.

For QuantitativeValue and MonetaryAmount, the recommended type for values is 'Number'.
For PropertyValue, it can be 'Text;', 'Number', 'Boolean', or 'StructuredValue'.

it seems that, if we were including the actual data, value would be the actual data values. So I'm not sure whether it is useful being included as part of the attributes table in index.html.

So my updated proposal is for the attributes table to include only these columns:

  • name: the variable name
  • description: the variable description
  • units: the unitText

Any feedback would be really appreciated!

This what my example is being parsed as currently:

image

and units are not properly parsed either:

image


Accepting changes in PR #67, the example would look like this:

image

and units are properly parsed:

image

๐ŸŽ‰

Yeah, I think https://schema.org/value is a bit of weird beast. I believe it's basically it's just the "column type," e.g. text string, number, boolean, dateTime, etc; which I guess can be a useful thing to tell readr (or mapped to EML's data types). Could be useful but not crucial.

For unit names, looks like you're going with EML-style names? I guess that's good because it's a common standard, though in general I think unit names that could be parsed by units package might be preferable? @amoeba thoughts?

Yeah, I think https://schema.org/value is a bit of weird beast. I believe it's basically it's just the "column type," e.g. text string, number, boolean, dateTime, etc; which I guess can be a useful thing to tell readr (or mapped to EML's data types). Could be useful but not crucial.

So, I thought that as well initially, but the more I looked at it, the more I was convinced that the definition in schema.org was not defining the values the value field could take, but more the data type of any values in the value field. In any case, as we cannot record data type as part of the dataspice workflow, I feel it's safe to remove for now.

For unit names, looks like you're going with EML-style names? I guess that's good because it's a common standard, though in general I think unit names that could be parsed by units package might be preferable?

Ha, busted! For this example dataset, I just lifted the attributes table than went with the example NEON dataset which I assume is EML. Will change for the workshop to unit names that units can parse. ๐Ÿ‘

PS, tutorial is shaping up!

Still got a few loose end to tie up but will tweet out final versions later today and report back on how workshop went this afternoon. We can then add a link in the dataspice repo.

Wish me luck!

Closed via #67

Yeah, I see what you mean with value, looking at the examples under https://schema.org/PropertyValue , it seems to really be the literal value in the data cell, e.g.

{
"@type": "PropertyValue",
 "name": "Wifi range",
  "value": 30,
  "unitCode": "FOT"
}

is the schema.org way of saying a this "wifi router has a range of 30 ft". So yeah, probably not what we want. The way it used in the earthcube example I sometimes crib from is also confusing, used in the context of describing a column rather than an individual value, but in that context seems like they are using it for what might ought to be name instead, just as you point out. Anyway, seems safe to ignore for now.

The tutorial and slides look awesome! โœจ Good luck & knock their socks off!