Be smarter in detecting version
Closed this issue · 2 comments
read_camtrap_dp()
currently detects the version on literal strings comparison in package$profile
:
camtraptor/R/read_camtrap_dp.R
Lines 85 to 95 in 4b2f513
Packages published through GBIF however, won't have the profile:
"https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0/camtrap-dp-profile.json"
But:
"https://rs.gbif.org/sandbox/data-packages/camtrap-dp/1.0/profile/camtrap-dp-profile.json"
# or
"https://rs.gbif.org/data-packages/camtrap-dp/1.0/profile/camtrap-dp-profile.json"
As a result, read_camtrap_dp()
says their version is not supported.
I think the profile code should check:
Does profile
contain camtrap-dp-profile.json
?
-> No: pass entire profile
as version
(will error)
-> Yes: continue
Does profile
contain regex for digits separated by dot (max 3 iterations)?
-> No: pass entire profile
as version
(will error)
-> Yes: pass extracted regex to supported versions (might error)
Thanks @peterdesmet for the suggestion.
What do you think about these regex rules? Notice however that pattern
returns the very first detected number in the string even if not followed by a dot. See last two examples with profile3
and profile4
respectively. That's why I think the second regex, pattern_improved
, is better. Downside of this regex: the version must contain at least one dot, otherwise NA is returned. See profile3
. However, this downside is way less dramatic than the downside of the first regex.
library(stringr)
pattern <- "\\d+(\\.\\d+){0,2}"
pattern_improved <- "\\d+(\\.\\d+){1,2}"
profile1 <- "a/b/c/10.12.5/camera/etc/camtrap-dp-profile.json"
version1 <- stringr::str_extract(profile1, pattern)
version1_improved <- stringr::str_extract(profile1, pattern_improved)
version1
#> [1] "10.12.5"
version1_improved
#> [1] "10.12.5"
profile2 <- "a/b/c/d1.0d/cam/camtrap-dp-profile.json"
version2 <- str_extract(profile2, pattern)
version2_improved <- stringr::str_extract(profile2, pattern_improved)
version2
#> [1] "1.0"
version2_improved
#> [1] "1.0"
profile3 <- "a/b/c/d1d/cam/v2/camtrap-dp-profile.json"
version3 <- str_extract(profile3, pattern)
version3_improved <- stringr::str_extract(profile3, pattern_improved)
version3
#> [1] "1"
version3_improved
#> [1] NA
profile4 <- "a/b/c/d1d/cam/3.0.5/camtrap-dp-profile.json"
version4 <- str_extract(profile4, pattern)
version4_improved <- stringr::str_extract(profile4, pattern_improved)
version4
#> [1] "1"
version4_improved
#> [1] "3.0.5"
Created on 2024-01-15 with reprex v2.0.2
Nice, I would go for the improved version. It is a likely expectation that the Camtrap DP version number will always contain a dot.