inbo/camtraptor

Be smarter in detecting version

Closed this issue · 2 comments

read_camtrap_dp() currently detects the version on literal strings comparison in package$profile:

# get package version
profile <- package$profile
if (profile == "https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0/camtrap-dp-profile.json") {
version <- "1.0"
} else {
if (profile == "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/camtrap-dp-profile.json") {
version <- "0.1.6"
} else {
version <- profile
}
}

Packages published through GBIF however, won't have the profile:

"https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0/camtrap-dp-profile.json"

But:

"https://rs.gbif.org/sandbox/data-packages/camtrap-dp/1.0/profile/camtrap-dp-profile.json"
# or
"https://rs.gbif.org/data-packages/camtrap-dp/1.0/profile/camtrap-dp-profile.json"

As a result, read_camtrap_dp() says their version is not supported.

I think the profile code should check:

Does profile contain camtrap-dp-profile.json?
-> No: pass entire profile as version (will error)
-> Yes: continue

Does profile contain regex for digits separated by dot (max 3 iterations)?
-> No: pass entire profile as version (will error)
-> Yes: pass extracted regex to supported versions (might error)

Thanks @peterdesmet for the suggestion.

What do you think about these regex rules? Notice however that pattern returns the very first detected number in the string even if not followed by a dot. See last two examples with profile3 and profile4 respectively. That's why I think the second regex, pattern_improved, is better. Downside of this regex: the version must contain at least one dot, otherwise NA is returned. See profile3. However, this downside is way less dramatic than the downside of the first regex.

library(stringr)

pattern <- "\\d+(\\.\\d+){0,2}"
pattern_improved <- "\\d+(\\.\\d+){1,2}"

profile1 <- "a/b/c/10.12.5/camera/etc/camtrap-dp-profile.json"
version1 <- stringr::str_extract(profile1, pattern)
version1_improved <- stringr::str_extract(profile1, pattern_improved)
version1
#> [1] "10.12.5"
version1_improved
#> [1] "10.12.5"

profile2 <- "a/b/c/d1.0d/cam/camtrap-dp-profile.json"
version2 <- str_extract(profile2, pattern)
version2_improved <- stringr::str_extract(profile2, pattern_improved)
version2
#> [1] "1.0"
version2_improved
#> [1] "1.0"


profile3 <- "a/b/c/d1d/cam/v2/camtrap-dp-profile.json"
version3 <- str_extract(profile3, pattern)
version3_improved <- stringr::str_extract(profile3, pattern_improved)
version3
#> [1] "1"
version3_improved
#> [1] NA

profile4 <- "a/b/c/d1d/cam/3.0.5/camtrap-dp-profile.json"
version4 <- str_extract(profile4, pattern)
version4_improved <- stringr::str_extract(profile4, pattern_improved)
version4
#> [1] "1"
version4_improved
#> [1] "3.0.5"

Created on 2024-01-15 with reprex v2.0.2

Nice, I would go for the improved version. It is a likely expectation that the Camtrap DP version number will always contain a dot.