unitsofmeasurement/uom-systems

UCUMFormatter fails to parse CLDR.LITER_PER_100KILOMETRES correctly

feluso opened this issue · 2 comments

@Test
    void testRoundTripUCUMCSCLDRLiterPer100KM()
    {
        final String formatted = UCUMFormat.getInstance(UCUMFormat.Variant.CASE_SENSITIVE).format(systems.uom.unicode.CLDR.LITER_PER_100KILOMETERS);
        final Unit<? extends Quantity<?>> parsedUnit = UCUMFormat.getInstance(UCUMFormat.Variant.CASE_SENSITIVE).parse(formatted);
        System.out.print(parsedUnit);
    }

This test would fail with the following error:

systems.uom.ucum.internal.format.TokenException
	at systems.uom.ucum.internal.format.UCUMFormatParser.SimpleUnit(UCUMFormatParser.java:207)
	at systems.uom.ucum.internal.format.UCUMFormatParser.Annotatable(UCUMFormatParser.java:180)
	at systems.uom.ucum.internal.format.UCUMFormatParser.Component(UCUMFormatParser.java:122)
	at systems.uom.ucum.internal.format.UCUMFormatParser.Term(UCUMFormatParser.java:77)
	at systems.uom.ucum.internal.format.UCUMFormatParser.parseUnit(UCUMFormatParser.java:67)
	at systems.uom.ucum.format.UCUMFormat$Parsing.parse(UCUMFormat.java:508)
	at systems.uom.ucum.format.UCUMFormat$Parsing.parse(UCUMFormat.java:526)

Upon closer inspection using a debugger, we can see the values in UCUMFormatParser

imagen

It seems the reason it fails it's because it parses 100 as h from HECTO but as this was already a km, the prefix only catches the h and not the k from KILO which causes it to be unable to recover the m unit from the symbols map, get a null instead and throw a TokenException

keilw commented

Thanks @feluso for pointing that out. HECTO and KILO should not be chained, in fact ICU4J (which is closely matched by the Unicode module here) defines in MeasureUnit.Complexity, that the recently declared MeasureUnit.MeasurePrefix must not apply to COMPOUND units like meter-per-second, kilowatt-hour, kilogram-meter-per-square-second, etc.

We may impose similar constraints and checks because e.g. HECTO(KILOMETRE_PER_HOUR) or even HECTO(KILOMETRE) as mentioned here could both lead to similar problems. And while there currently is no exception thrown, chaining prefix operations either explicitly or implicitly (if you use an already prefixed unit) could be problematic.

So you came across 100 automatically being turned into HECTO?

Seems related to #177

thanks for the explanation, that makes sense, the constraints seem appropiate, and regarding #177 while similar, this seems (as the replies suggest) that it's more over the fact they seem to be flipped, as noted in tech.units.indriya.AbstractUnit:406:

imagen