ropensci/dataspice

Using dataspice for multiple datasets

robitalec opened this issue · 0 comments

Continuing our discussion from #110, I found two obvious hurdles when using dataspice for multiple datasets. In this example, I am splitting up the mtcars example data into an uneven and overlapping set of columns, and distinct set of rows. Then using create_spice, prep_attributes and prep_access, followed by edit_* to setup our metadata files.

Setup

library(dataspice)

dir.create('data')
write.csv(mtcars[1:10, 1:4], 'data/mtcars1.csv')
write.csv(mtcars[11:20, 2:6], 'data/mtcars2.csv')

prep_access()

# The following fileNames have been added to the access file: mtcars1.csv, mtcars2.csv

prep_attributes()

# The following variableNames have been added to the attributes file for mtcars1.csv: X1, mpg, cyl, disp, hp
# The following variableNames have been added to the attributes file for mtcars2.csv: X1, cyl, disp, hp, drat, wt
# Warning messages:
# 1: Missing column names filled in: 'X1' [1] 
# 2: Missing column names filled in: 'X1' [1] 

Then I added some filler information to the metadata. Here are those files zipped: metadata.zip

edit_access()
edit_attributes()
edit_biblio()
edit_creators()

In this example biblio, I added another row for "mtcars2" as suggested in the Shiny app with a right click. It looks like this:

read.csv('data/metadata/biblio.csv')

#     title description datePublished
# 1 mtcars 1          NA          1974
# 2 mtcars 2          NA          1974

#                                              citation
# 1 Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
# 2 Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.

#   keywords license funder geographicDescription northBoundCoord
# 1       NA      NA     NA                    NA              47
# 2       NA      NA     NA                    NA              57

#   eastBoundCoord southBoundCoord westBoundCoord wktString  startDate
# 1            -98              32           -120        NA 1974-01-01
# 2            -88              42           -110        NA 1974-01-01
 
#     endDate
# 1 1975-01-01
# 2 1975-01-01

Challenges

In write_spice(), we get a warning from the is.na(biblio$keyworks) check, which is only expecting keywords from one row of data.

https://github.com/ropensci/dataspice/blob/main/R/write_spice.R#L67

write_spice()
Warning message:
In if (is.na(biblio$keywords)) { :
  the condition has length > 1 and only the first element will be used

In build_site(), we get an error trying to parse the boxes described in data/metadata/biblio.csv. I was expecting this to simply generate two boxes, instead of one when we are using a single dataset.

build_site()

# Error: Failed to parse box in spatialCoverage$geo$box of '47 -98 32 -12057 -88 42 -110'. 

If you try and remove the second set of east/west/north/south coordinates, the same error occurs:

build_site()

# Error: Failed to parse box in spatialCoverage$geo$box of '47 -98 32 -120NA NA NA NA'. 

This error occurs in build_site() but originates in write_spice() (L88) as the output spatialCoverage is an unexpected list of length 2.

write_spice()

# In dataspice.json
# ...
#  "spatialCoverage": {
#     "type": "Place",
#     "name": [null, null],
#     "geo": {
#       "type": "GeoShape",
#       "box": ["47 -98 32 -120", "37 -88 42 -130"]
#    }
#  }

Within build_site(), the error occurs in the length check == 1 in function parse_GeoShape_box().

biblio <- read.csv('data/metadata/biblio.csv')

box <- paste(biblio$northBoundCoord, biblio$eastBoundCoord,
            biblio$southBoundCoord, biblio$westBoundCoord)
box

# [1] "47 -98 32 -120" "37 -88 42 -130"

tokens <- stringr::str_split(box, " ")

tokens

# [[1]]
# [1] "47"   "-98"  "32"   "-120"

# [[2]]
# [1] "37"   "-88"  "42"   "-130"

if (!length(tokens) == 1) {
  stop("Failed to parse box in spatialCoverage$geo$box of '", 
       box, "'.", call. = FALSE)
}