Possibly export `ud_convert` function?
Closed this issue · 9 comments
I'm wondering if it's possible to export the currently unexported functionud_convert
?
Use case: I have a package for working with water quality data - about 15 million rows. It contains columns of data of mixed units (unit stored in a column), many of which need to be converted to another unit (stored in a different column). Because they're mixed, I have to vapply
or loop over the rows, thus calling set_units(set_units(x, unitA), unitB)
many times. Calling ud_convert()
directly is much faster:
library(units)
#> udunits system database from /usr/local/share/udunits
library(microbenchmark)
microbenchmark(
units:::ud_convert(1, "mg/L", "ug/L"),
set_units(set_units(1, "mg/L"), "ug/L")
)
#> Unit: microseconds
#> expr min lq mean median
#> units:::ud_convert(1, "mg/L", "ug/L") 26.051 41.6465 63.16936 59.328
#> set_units(set_units(1, "mg/L"), "ug/L") 2078.241 3441.7690 4155.80035 4327.640
#> uq max neval
#> 76.1755 134.01 100
#> 4775.0780 10895.10 100
A real-world example with a 2000-row sample of data:
library(units)
csv_file <- "https://gist.githubusercontent.com/ateucher/ec2caf91108639b8126a1b0349bee5a3/raw/28f7d02c41e0d1436da46f7319aa6a57065cdb1c/test.csv"
x <- read.csv(csv_file, stringsAsFactors = FALSE)
# Various combinations of units:
table(x$from_unit, x$to_unit)
#>
#> CFU/100mL g/m2 m m3/d mg/L t/d ueq/L ug/g ug/m3
#> % 0 0 0 0 2 0 0 2 0
#> E3m3/d 0 0 0 48 0 0 0 0 0
#> kg/d 0 0 0 0 0 7 0 0 0
#> L/min 31 0 0 0 0 0 0 0 0
#> mg/dm2/d 0 0 0 0 10 0 0 0 0
#> mg/kg 0 0 0 0 0 0 0 126 0
#> mg/L 0 0 0 0 0 0 28 2 0
#> mg/m3 0 0 0 0 0 0 0 0 1
#> mm 0 0 8 0 0 0 0 0 0
#> ng/L 0 0 0 0 1 0 0 0 0
#> ug/cm2 0 5 0 0 2 0 0 0 0
#> ug/L 0 0 0 0 1727 0 0 0 0
# wrapper function to test two ways of converting
convert <- function(value, from, to, fun) {
stopifnot(length(value) == length(from) && length(from) == length(to))
vapply(seq_along(value),
function(i) {
tryCatch(fun(value[i], from[i], to[i]),
error = function(e) NA_real_)
}, FUN.VALUE = double(1))
}
convert_units <- function(value, from, to) {
ret <- units::set_units(units::set_units(value, from, mode = "standard"), to, mode = "standard")
as.numeric(ret)
}
# using set_units
system.time(
test1 <- convert(x$val, x$from_unit, x$to_unit, convert_units)
)
#> user system elapsed
#> 7.230 0.663 8.725
# using ud_convert
system.time(
test2 <- convert(x$val, x$from_unit, x$to_unit, units:::ud_convert)
)
#> user system elapsed
#> 0.192 0.008 0.211
all.equal(test1, test2)
#> [1] TRUE
Created on 2021-02-16 by the reprex package (v1.0.0)
I was also experimenting with mixed_units()
, but while I could set the mixed units, I didn't see an obvious way to convert them. But it is entirely possible I'm missing something.
Hmmm, this is possibly a solution:
library(dplyr)
convert_units2 <- function(value, from, to) {
stopifnot(length(unique(from)) ==1)
stopifnot(length(unique(to)) ==1)
ret <- tryCatch(
units::set_units(units::set_units(value, from, mode = "standard"), to, mode = "standard"),
error = function(e) NA_real_
)
as.numeric(ret)
}
system.time(
test3 <- x %>%
group_by(from_unit, to_unit) %>%
mutate(new_val = convert_units2(val, from_unit[1], to_unit[1]))
)
#> user system elapsed
#> 0.058 0.002 0.061
all.equal(test2, test3$new_val)
#> [1] TRUE
Created on 2021-02-16 by the reprex package (v1.0.0)
set_units
provides a method for mixed_units
, so you don't need to loop over units:
from <- c("m/s", "km/h", "mg/L", "g")
to <- c("km/h", "m/s", "g/L", "kg")
x <- mixed_units(1:4, from)
set_units(x, to, mode="standard")
#> Mixed units: g/L (1), kg (1), km/h (1), m/s (1)
#> 3.6 [km/h], 0.5555556 [m/s], 0.003 [g/L], 0.004 [kg]
For reference, this would be the way to do it with mixed_units
:
x <- read.csv(csv_file, stringsAsFactors = FALSE)
x <- x[mapply(ud_are_convertible, x$to_unit, x$from_unit), ]
x$val <- with(x, mixed_units(val, from_unit))
x$new_val <- with(x, set_units(val, to_unit, mode="standard"))
I just dropped non-convertible units. Yours is a good approach too, faster in this case because many units are repeated. Thus, we could probably take this idea of grouping into the internals of set_units.mixed_units
to improve its performance.
Anyway, in answer to the original request, I don't think that exporting ud_convert
is a good idea, because it entirely defeats the purpose of the package.
This is brilliant, thanks @Enchufa2. I had missed the mixed_units
method for set_units
- and I do recognize that this request was outside the intent of the package. I think I'll go with the grouped approach.
ud_are_convertible
is also not exported... would you consider exporting that, or a variant that fits with the package philosophy?
Thanks for the quick response
I took a look at the thread referenced above and I do think your grouped approach is the best for your use case.
ud_are_convertible
is exported in the devel version.
I think so too. Thanks so much for the thorough responses.
@Enchufa2 sorry to pester you again on this. Do you have an idea of when the next version (with ud_are_convertible
exported) will be heading to CRAN?
I would like to give another pass to the list of issues and address some of them before going for another release.