asdf-format/asdf

Consider inspecting modules for class paths

Closed this issue · 0 comments

Currently asdf converters can express types as either classes or strings (interpreted as class paths).

Taking the AngleConverter in asdf-astropy as an example. It uses class paths:

class AngleConverter(QuantityConverter):
    tags = ("tag:astropy.org:astropy/coordinates/angle-*",)
    types = (
        "astropy.coordinates.angles.Angle",
        "astropy.coordinates.angles.core.Angle",
    )

Which have the benefit of not requiring that astropy.coordinates.angles.Angle be imported and created when the converter is proxied (which occurs when the ExtensionManager is created when AsdfFile.extension_manager is accessed for the first time).

Looking at the AngleConverter this creates an issue when upstream packages move their classes (note the inclusion of Angle and core.Angle) even if this move does not change the public location of the class (in this case astropy.coordinates.Angle which isn't listed above).

This has led to a number of errors (here are a few):

In addition to the converter breaking due to an apparent internal change in a dependency in the long run this can result in many class paths that may end up never being used (as eventually the older versions of the dependency will fall out of use).

If we were to import class paths to turn them into classes at the time of proxying this would mean that any conversion of custom to tagged tree (which occurs during validation and saving) would result in loading every module of every extension.

Currently the ExtensionManager will first prefer to use the object class in both handles_type:

return typ in self._converters_by_type or get_class_name(typ, instance=False) in self._converters_by_type

(which I think will be called first) and get_converter_for_type
try:
return self._converters_by_type[typ]
except KeyError:
class_name = get_class_name(typ, instance=False)
try:
return self._converters_by_type[class_name]

It might be worth considering how the fallback for failing to find a converter based on the class is handled to better account for changes like the above with Angle.

During custom_tree_to_tagged_tree (where the converter types are used) we can expect that the module containing the public and private interfaces of the type have already been imported (as they've already been used to construct the custom object). If a converter lookup based on class fails, we could fall back to looking for the closest matching converters that have not yet been indexed (ones that were registered by 'path') and index them by:

  • importing the class at the named path
  • registering the newly realized converter by class
  • checking if the class matches (using instance.__class__ is class)
  • if it matches, stop indexing converters
  • if it doesn't match, continue indexing until no more paths exist in the same module (or maybe all converters, see below)

For the above Angle example. As is, this should result in:

  • no converter being found for the class (as it's registered by path)
  • the class path is inspected (astropy.coordinates.angles.core.Angle)
  • the AngleConverter is found (with an exact match)
  • the private astropy.coordinates.angles.core.Angle class is imported
  • the AngleConverter is indexed using the imported class
  • the class matches instance.__class__ and the search ends

If the public class path (astropy.coordinates.Angle) were added (and all private class paths removed) in AngleConverter.types the following should occur:

  • no converter found for the class
  • the class path is inspected (astropy.coordinates.angles.core.Angle)
  • every converter with astropy.coordinates.angles.core is indexed (none exist)
  • every converter with astropy.coordinates.angles is indexed (none exist)
  • every converter with astropy.coordinates is indexed (many exist but all classes in this module have likely already been created)

If the private class implementation was not in a sub-module of the public location it's likely that all converters for a module would be indexed (this could be costly for a package that only creates sub-modules when they are directly imported, like astropy).

The above would not work if a private implementation was in a different package (if 'foo.Bar' was actually 'bam.Bar' searching would fail) unless we allow every converter to be indexed during searching.

The above change should continue to be compatible with the existing converters that use class paths and the fall-back of using the private class path (for the edge cases above) would continue to work.