r-quantities/units

Invalid unit name with install_unit() does not give an error

Opened this issue · 9 comments

I'm working with clinical laboratory data where a unit is defined for hemoglobin called "g%" (gram-percent) which is equal to 1 g/dL. While trying to create a new unit for it, I guessed that the unit name would not work because of the percent sign, but it didn't give an error when trying to install the unit; it only gave an error when trying to use the unit.

It would be helpful if install_unit() did a check to confirm that the unit was successfully installed before returning.

library(units)
#> udunits database from C:/Users/Bill Denney/Documents/R/win-library/4.1/units/share/udunits/udunits2.xml
install_unit(symbol="g_hemoglobin")
install_unit(symbol="mol_hemoglobin", def="100/0.006206 g_hemoglobin")
install_unit(symbol="g%_hemoglobin", def="1 g_hemoglobin/dL")
foo <- set_units(10, "g_hemoglobin/dL", mode="standard")
set_units(foo, "mmol_hemoglobin/L", mode="standard")
#> 6.206 [mmol_hemoglobin/L]
set_units(foo, "g%_hemoglobin", mode="standard")
#> Error: In 'g%_hemoglobin', 'g%_hemoglobin' is not recognized by udunits.
#> 
#> See a table of valid unit symbols and names with valid_udunits().
#> Custom user-defined units can be added with install_unit().
#> 
#> See a table of valid unit prefixes with valid_udunits_prefixes().
#> Prefixes will automatically work with any user-defined unit.

Created on 2021-10-13 by the reprex package (v2.0.1)

To be clear, I don't think that the units library should necessarily support "g%_hemoglobin". The request is just that an error is generated.

Agree. The thing is... the udunits2 call succeeds, but then the unit is not available. I don't know what happens internally, but I suppose it has something to do with the fact that the percentage symbol is part of the grammar. So this issue should be raised upstream.

Actually, it succeeds because it is permitted and it works:

#include <stdio.h>
#include <udunits2/udunits2.h>

int main() {
    ut_set_error_message_handler((ut_error_message_handler) ut_ignore);
    ut_system *sys = ut_read_xml(NULL);
    ut_encoding enc = UT_UTF8;
    ut_set_error_message_handler((ut_error_message_handler) vprintf);
    
    // install g_hemoglobin
    ut_unit *g_hemoglobin = ut_new_base_unit(sys);
    ut_map_symbol_to_unit("g_hemoglobin", enc, g_hemoglobin);
    ut_map_unit_to_symbol(g_hemoglobin, "g_hemoglobin", enc);

    // install g%_hemoglobin
    ut_unit *gpc_hemoglobin = ut_parse(sys, "1 g_hemoglobin/dL", enc);
    ut_map_symbol_to_unit("g%_hemoglobin", enc, gpc_hemoglobin);
    ut_map_unit_to_symbol(gpc_hemoglobin, "g%_hemoglobin", enc);

    // conversion
    char *from = "g_hemoglobin/dL", to[128];
    ut_unit *from_u = ut_parse(sys, from, enc);
    cv_converter *cv = ut_get_converter(from_u, gpc_hemoglobin);
    ut_format(gpc_hemoglobin, to, sizeof(to), enc);
    double x = 10;
    printf("From: %f %s\n", x, from);
    printf("To  : %f %s\n", cv_convert_double(cv, x), to);
    
    cv_free(cv);
    ut_free(from_u);
    ut_free(g_hemoglobin);
    ut_free(gpc_hemoglobin);
    ut_free_system(sys);
    return 0;
}

Save this as test.c and then

$ gcc test.c -l udunits2 && ./a.out
From: 10.000000 g_hemoglobin/dL
To  : 10.000000 g%_hemoglobin

So this means that we are doing something wrong with that percentage symbol somewhere.

However:

#include <stdio.h>
#include <udunits2/udunits2.h>

int main() {
    ut_set_error_message_handler((ut_error_message_handler) ut_ignore);
    ut_system *sys = ut_read_xml(NULL);
    ut_encoding enc = UT_UTF8;
    ut_set_error_message_handler((ut_error_message_handler) vprintf);
    
    // install g_hemoglobin
    ut_unit *g_hemoglobin = ut_new_base_unit(sys);
    ut_map_symbol_to_unit("g_hemoglobin", enc, g_hemoglobin);
    ut_map_unit_to_symbol(g_hemoglobin, "g_hemoglobin", enc);

    // install g%_hemoglobin
    ut_unit *gpc_hemoglobin = ut_parse(sys, "1 g_hemoglobin/dL", enc);
    ut_map_symbol_to_unit("g%_hemoglobin", enc, gpc_hemoglobin);
    ut_map_unit_to_symbol(gpc_hemoglobin, "g%_hemoglobin", enc);

    // conversion works
    char *from = "g_hemoglobin/dL", to[128];
    ut_unit *from_u = ut_parse(sys, from, enc);
    cv_converter *cv = ut_get_converter(from_u, gpc_hemoglobin);
    ut_format(gpc_hemoglobin, to, sizeof(to), enc);
    double x = 10;
    printf("From: %f %s\n", x, from);
    printf("To  : %f %s\n", cv_convert_double(cv, x), to);

    // parsing doesn't work
    ut_unit *u = ut_parse(sys, to, enc);
    if (!u) printf("NULL unit!\n");
    
    cv_free(cv);
    ut_free(from_u);
    ut_free(g_hemoglobin);
    ut_free(gpc_hemoglobin);
    ut_free_system(sys);
    return 0;
}
$ gcc test.c -l udunits2 && ./a.out
From: 10.000000 g_hemoglobin/dL
To  : 10.000000 g%_hemoglobin
NULL unit!

Thanks for reporting it upstream!

Since there doesn't appear to be movement upstream, would it be reasonable to add some form of test here to see if it works? Perhaps within install_units() add a test like set_units(1, new_unit_name), see if it succeeds, and if not raise an informative error to the user?

The thing is that, as shown in the reprex, installation and even conversion works. It's parsing after the unit was installed what has a bug and doesn't work, and it's only for these cases with a percentage in the name. So not a fan of adding this test just for a bug in an edge case which in turn succeeded (and the database of units is already "polluted" with an unparseable identifier).

We will just not allow percentage characters in symbols/names until this issue is properly solved upstream (implemented in c9e958f):

units::install_unit(symbol="g%_hemoglobin", def="1 g_hemoglobin/dL")
#> Error in units::install_unit(symbol = "g%_hemoglobin", def = "1 g_hemoglobin/dL"): 
#>   Symbols/names with a percentage character '%' are not allowed (see https://github.com/r-quantities/units/issues/289)

Unfortunately, the UDUNITS2 grammar is more complex. As I read it, it allows a percent sign by itself but not a percent sign as part of a longer units string (https://www.unidata.ucar.edu/software/udunits/udunits-2.1.24/udunits2lib.html#Grammar). So, this would break the percent ("%") unit.

I'm guessing that it does not make sense for you to do a full grammar check before installing the unit. So, this may be a better thing to wait until fixed upstream.

No, this doesn't break anything. We are just disabling any new user-provided symbol or name with a percentage sign in it. The existing percent unit is safely loaded from the system XML.