Using dataspice for multiple datasets
robitalec opened this issue · 0 comments
Continuing our discussion from #110, I found two obvious hurdles when using dataspice
for multiple datasets. In this example, I am splitting up the mtcars example data into an uneven and overlapping set of columns, and distinct set of rows. Then using create_spice
, prep_attributes
and prep_access
, followed by edit_*
to setup our metadata files.
Setup
library(dataspice)
dir.create('data')
write.csv(mtcars[1:10, 1:4], 'data/mtcars1.csv')
write.csv(mtcars[11:20, 2:6], 'data/mtcars2.csv')
prep_access()
# The following fileNames have been added to the access file: mtcars1.csv, mtcars2.csv
prep_attributes()
# The following variableNames have been added to the attributes file for mtcars1.csv: X1, mpg, cyl, disp, hp
# The following variableNames have been added to the attributes file for mtcars2.csv: X1, cyl, disp, hp, drat, wt
# Warning messages:
# 1: Missing column names filled in: 'X1' [1]
# 2: Missing column names filled in: 'X1' [1]
Then I added some filler information to the metadata. Here are those files zipped: metadata.zip
edit_access()
edit_attributes()
edit_biblio()
edit_creators()
In this example biblio, I added another row for "mtcars2" as suggested in the Shiny app with a right click. It looks like this:
read.csv('data/metadata/biblio.csv')
# title description datePublished
# 1 mtcars 1 NA 1974
# 2 mtcars 2 NA 1974
# citation
# 1 Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
# 2 Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
# keywords license funder geographicDescription northBoundCoord
# 1 NA NA NA NA 47
# 2 NA NA NA NA 57
# eastBoundCoord southBoundCoord westBoundCoord wktString startDate
# 1 -98 32 -120 NA 1974-01-01
# 2 -88 42 -110 NA 1974-01-01
# endDate
# 1 1975-01-01
# 2 1975-01-01
Challenges
In write_spice()
, we get a warning from the is.na(biblio$keyworks)
check, which is only expecting keywords from one row of data.
https://github.com/ropensci/dataspice/blob/main/R/write_spice.R#L67
write_spice()
Warning message:
In if (is.na(biblio$keywords)) { :
the condition has length > 1 and only the first element will be used
In build_site()
, we get an error trying to parse the boxes described in data/metadata/biblio.csv
. I was expecting this to simply generate two boxes, instead of one when we are using a single dataset.
build_site()
# Error: Failed to parse box in spatialCoverage$geo$box of '47 -98 32 -12057 -88 42 -110'.
If you try and remove the second set of east/west/north/south coordinates, the same error occurs:
build_site()
# Error: Failed to parse box in spatialCoverage$geo$box of '47 -98 32 -120NA NA NA NA'.
This error occurs in build_site()
but originates in write_spice()
(L88) as the output spatialCoverage is an unexpected list of length 2.
write_spice()
# In dataspice.json
# ...
# "spatialCoverage": {
# "type": "Place",
# "name": [null, null],
# "geo": {
# "type": "GeoShape",
# "box": ["47 -98 32 -120", "37 -88 42 -130"]
# }
# }
Within build_site()
, the error occurs in the length check == 1 in function parse_GeoShape_box()
.
biblio <- read.csv('data/metadata/biblio.csv')
box <- paste(biblio$northBoundCoord, biblio$eastBoundCoord,
biblio$southBoundCoord, biblio$westBoundCoord)
box
# [1] "47 -98 32 -120" "37 -88 42 -130"
tokens <- stringr::str_split(box, " ")
tokens
# [[1]]
# [1] "47" "-98" "32" "-120"
# [[2]]
# [1] "37" "-88" "42" "-130"
if (!length(tokens) == 1) {
stop("Failed to parse box in spatialCoverage$geo$box of '",
box, "'.", call. = FALSE)
}