CF unit formatter incompatible with latest pint
aulemahal opened this issue · 5 comments
With pint 0.24, the formatting machinery changed a lot. In 0.24.0 our formatter broke because we couldn't re-create Unit object, the registry was None (see hgrecco/pint#2009), this was fixed, but another bugfix broke the specific case of a dimensionless variable, see hgrecco/pint#2024.
MWE with cf-xarray
from cf_xarray import units
u = units.units.parse_units("")
f"{u:cf}"
raises the error shown in the pint issue.
I realized our method was a bit complicated and could be made simpler using pint formatting tools instead of regex. However, I haven't found a solution that was backwards-compatible. Here are 3 solutions I found:
Custom formatter class
Idea suggested by @hgrecco
from pint.delegates.formatter.plain import DefaultFormatter
from pint.delegates.formatter._compound_unit_helpers import prepare_compount_unit
from pint import formatter
class CFUnitFormatter(DefaultFormatter):
def format_unit(self, unit, uspec, sort_func, **babel_kwds):
numerator, denominator = prepare_compount_unit(
unit,
"~D",
sort_func=sort_func,
**babel_kwds,
registry=self._registry,
)
out = formatter(
numerator,
denominator,
as_ratio=False,
product_fmt=" ",
power_fmt="{}{}",
parentheses_fmt=r"{}",
)
out = out.replace("Δ°", "delta_deg")
return out.replace("percent", "%")`
# Add to the registry
units.formatter._formatters['cf'] = CFUnitFormatter
- Pro : User code does not need modification.
- Con : Requires pinning pint >=0.24, will break eventually when
prepare_compount_unit
is modified (see comment in issue).
Simpler custom function
@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
num = [(u, e) for u, e in unit.items() if e >= 0]
den = [(u, e) for u, e in unit.items() if e < 0]
# in pint < 0.24, the first argument of `formatter` is num + den
# a test with packaging.version could make this work for all pint versions
out = formatter(
num,
den,
as_ratio=False,
product_fmt="{} {}",
power_fmt="{}{}"
)
out = out.replace("Δ°", "delta_deg")
return out.replace("percent", "%")
The trouble here is that we don't have control on the "shortening" of the units. What was previously f"{u:cf}"
must become f"{u:~cf}"
. And (of course!) f"{u:~cf}"
will not work with previous versions of cf_xarray
.
- Pro : Simple. Could be made to work with previous versions of pint by.
- Con : Requires users to change their code.
Hack and forget
Simply fix the dimensionless issue and don't think about it anymore.
@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
import re
# avoid issue in pint >= 0.24.1
unit = unit._units.__class__({k.replace('dimensionless', ''): v for k, v in unit._units.items()})
# convert UnitContainer back to Unit
unit = registry.Unit(unit)
# Print units using abbreviations (millimeter -> mm)
s = f"{unit:~D}"
# Search and replace patterns
pat = r"(?P<inverse>(?:1 )?/ )?(?P<unit>\w+)(?: \*\* (?P<pow>\d))?"
def repl(m):
i, u, p = m.groups()
p = p or (1 if i else "")
neg = "-" if i else ""
return f"{u}{neg}{p}"
out, n = re.subn(pat, repl, s)
# Remove multiplications
out = out.replace(" * ", " ")
# Delta degrees:
out = out.replace("Δ°", "delta_deg")
return out.replace("percent", "%")
- Pro: Only fails with 0.24.0, User code needs not be changed.
- Con: Still weirdly hacky.
I found something that works in all cases!
@pint.register_unit_format("cf")
def short_formatter(unit, registry, **options):
# pint 0.24.1 gives this for dimensionless units
if unit == {'dimensionless': 1}:
return ""
# Shorten the names
unit = pint.util.UnitsContainer({
registry._get_symbol(u): exp
for u, exp in unit.items()
})
if Version(pint.__version__) < Version('0.24'):
args = (unit.items(),)
else:
args = (
((u, e) for u, e in unit.items() if e >= 0),
((u, e) for u, e in unit.items() if e < 0),
)
out = pint.formatter(
*args,
as_ratio=False,
product_fmt=" ",
power_fmt="{}{}"
)
out = out.replace("Δ°", "delta_deg")
return out.replace("percent", "%")
Should gives the same result as before for all versions of pint (except 0.24.0, of course). In pint < 0.24, a dimensionless unit will yield "dimensionless"
because in the older pints, the custom formatter isn't even called in that case. In pint 0.24.1, this was modified and the special case is managed at the beginning of this proposition.
I am using packaging.version.Version
to ensure backward compatibility, which is not the elegantest but seems fine. Also, I assumed it was reasonable to use the "private" function registry._get_symbol
.
Nice work! Happy to merge this change. Let's also add a pint!=0.24.0
version pin.
Just curious, why out = out.replace("Δ°", "delta_deg")
The delta_deg
has to do with https://pint.readthedocs.io/en/stable/user/nonmult.html, i.e. when two temperatures are subtracted, the result is not a temperature but rather a "delta". And the short symbol for that is the greek letter.
Both "Δ°C" and "delta_degC" are not recognized by udunits, so my guess is that this is to avoid unicode problems down the line in a netCDF ?
I recently stumbled upon these delta in xclim and I added an automatic translation into the related absolute unit in the specific function. (Δ°C -> K
, Δ°F -> °R
) But maybe that's a bit too strong for cf-xarray...
Let's add that as a comment in the code when you send in your PR.