xarray-contrib/cf-xarray

CF unit formatter incompatible with latest pint

aulemahal opened this issue · 5 comments

With pint 0.24, the formatting machinery changed a lot. In 0.24.0 our formatter broke because we couldn't re-create Unit object, the registry was None (see hgrecco/pint#2009), this was fixed, but another bugfix broke the specific case of a dimensionless variable, see hgrecco/pint#2024.

MWE with cf-xarray

from cf_xarray import units

u = units.units.parse_units("")
f"{u:cf}"

raises the error shown in the pint issue.

I realized our method was a bit complicated and could be made simpler using pint formatting tools instead of regex. However, I haven't found a solution that was backwards-compatible. Here are 3 solutions I found:

Custom formatter class

Idea suggested by @hgrecco

from pint.delegates.formatter.plain import DefaultFormatter
from pint.delegates.formatter._compound_unit_helpers import prepare_compount_unit
from pint import formatter

class CFUnitFormatter(DefaultFormatter):
    def format_unit(self, unit, uspec, sort_func, **babel_kwds):
        numerator, denominator = prepare_compount_unit(
            unit,
            "~D",
            sort_func=sort_func,
            **babel_kwds,
            registry=self._registry,
        )

        out = formatter(
            numerator,
            denominator,
            as_ratio=False,
            product_fmt=" ",
            power_fmt="{}{}",
            parentheses_fmt=r"{}",
        )
        out = out.replace("Δ°", "delta_deg")
        return out.replace("percent", "%")`

# Add to the registry
units.formatter._formatters['cf'] = CFUnitFormatter
  • Pro : User code does not need modification.
  • Con : Requires pinning pint >=0.24, will break eventually when prepare_compount_unit is modified (see comment in issue).

Simpler custom function

@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
    num = [(u, e) for u, e in unit.items() if e >= 0]
    den = [(u, e) for u, e in unit.items() if e < 0]
    # in pint < 0.24, the first argument of `formatter` is num + den
    # a test with packaging.version could make this work for all pint versions
    out = formatter(
        num,
        den,
        as_ratio=False,
        product_fmt="{} {}",
        power_fmt="{}{}"
    )

    out = out.replace("Δ°", "delta_deg")
    return out.replace("percent", "%")

The trouble here is that we don't have control on the "shortening" of the units. What was previously f"{u:cf}" must become f"{u:~cf}". And (of course!) f"{u:~cf}" will not work with previous versions of cf_xarray.

  • Pro : Simple. Could be made to work with previous versions of pint by.
  • Con : Requires users to change their code.

Hack and forget

Simply fix the dimensionless issue and don't think about it anymore.

    @pint.register_unit_format("cf")
    def short_formatter(unit, registry, **options):
        import re

        # avoid issue in pint >= 0.24.1
        unit  = unit._units.__class__({k.replace('dimensionless', ''): v for k, v in unit._units.items()})
        # convert UnitContainer back to Unit
        unit = registry.Unit(unit)
        # Print units using abbreviations (millimeter -> mm)
        s = f"{unit:~D}"

        # Search and replace patterns
        pat = r"(?P<inverse>(?:1 )?/ )?(?P<unit>\w+)(?: \*\* (?P<pow>\d))?"

        def repl(m):
            i, u, p = m.groups()
            p = p or (1 if i else "")
            neg = "-" if i else ""

            return f"{u}{neg}{p}"

        out, n = re.subn(pat, repl, s)

        # Remove multiplications
        out = out.replace(" * ", " ")
        # Delta degrees:
        out = out.replace("Δ°", "delta_deg")
        return out.replace("percent", "%")
  • Pro: Only fails with 0.24.0, User code needs not be changed.
  • Con: Still weirdly hacky.

I found something that works in all cases!

@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
    # pint 0.24.1 gives this for dimensionless units
    if unit == {'dimensionless': 1}:
        return ""

    # Shorten the names
    unit = pint.util.UnitsContainer({
        registry._get_symbol(u): exp
        for u, exp in unit.items()
    })

    if Version(pint.__version__) < Version('0.24'):
        args = (unit.items(),)
    else:
        args = (
            ((u, e) for u, e in unit.items() if e >= 0),
            ((u, e) for u, e in unit.items() if e < 0),
        )

    out = pint.formatter(
        *args,
        as_ratio=False,
        product_fmt=" ",
        power_fmt="{}{}"
    )
    out = out.replace("Δ°", "delta_deg")
    return out.replace("percent", "%")

Should gives the same result as before for all versions of pint (except 0.24.0, of course). In pint < 0.24, a dimensionless unit will yield "dimensionless" because in the older pints, the custom formatter isn't even called in that case. In pint 0.24.1, this was modified and the special case is managed at the beginning of this proposition.

I am using packaging.version.Version to ensure backward compatibility, which is not the elegantest but seems fine. Also, I assumed it was reasonable to use the "private" function registry._get_symbol.

Nice work! Happy to merge this change. Let's also add a pint!=0.24.0 version pin.

Just curious, why out = out.replace("Δ°", "delta_deg")

The delta_deg has to do with https://pint.readthedocs.io/en/stable/user/nonmult.html, i.e. when two temperatures are subtracted, the result is not a temperature but rather a "delta". And the short symbol for that is the greek letter.

Both "Δ°C" and "delta_degC" are not recognized by udunits, so my guess is that this is to avoid unicode problems down the line in a netCDF ?

I recently stumbled upon these delta in xclim and I added an automatic translation into the related absolute unit in the specific function. (Δ°C -> K, Δ°F -> °R) But maybe that's a bit too strong for cf-xarray...

Let's add that as a comment in the code when you send in your PR.