GenericMappingTools/pygmt

New alias system towards a more Pythonic interface

seisman opened this issue · 3 comments

GMT's single-letter options (e.g. -B) are difficult to read/understand, so they are not recommended for use in PyGMT. Instead, PyGMT uses long-form parameters and the PyGMT alias system is responsible for translating PyGMT long-form parameters into the corresponding short-form GMT options. The alias system was originally implemented by @leouieda in aad12e0 (seven years ago!) and hasn't changed much since then. The alias system has some limitations and flaws that prevent us from achieving the project goal: "Build a Pythonic API for GMT". Now it's time to design a new alias system. This issue reviews the current alias system and proposes a new alias system. The initial implementation of the proposed new alias system is available for review in #3238.

The current alias system

Currently, the alias system looks like this:

@fmt_docstrings
@use_alias(
    R="region",
    B="frame",
    J="projection",
)
@kwargs_to_strings(R="sequence")
def func(self, **kwargs):
    with Session() as lib:
        lib.call_module("basemap", args=build_arg_list(kwargs))

The current alias system works in this way:

  1. The kwargs_to_string decorator converts an argument to a string. The argument can be a string, a numeric value, or a sequence (e.g., converting region=[10, 20, 30, 40] to region="10/20/30/40").
  2. The use_alias decorator maps long-form PyGMT parameters (e.g, region) to short-form GMT options (e.g., R). The short-form options are then stored in kwargs (i.e., converting region="10/20/30/40" to kwargs["R"]="10/20/30/40".
  3. build_arg_list (previously build_arg_string) converts the dictionary kwargs to a list/string that GMT API can take.

The current alias system has some known limitations and flaws:

  1. Long arguments are difficult to read/write.

    Since each GMT option usually has many modifiers, some arguments are very long and no tab autocompletion is possible.

    Here is an example from #1082:

    fig.logo(position="jTR+o0.3c/0.6c+w3c", box="+p1p+glightblue")
    

    The parameter names position and box are good, but their arguments are difficult to write/read. In #1082, some candidate solutions (dict, class or function) were proposed. Please refer to #1082 for detailed discussions.

  2. Short arguments are easy to write but difficult to read

    For some options, GMT uses single-letter arguments. Here are two examples:

    1. Figure.coast, resolution="f" is not readable. resolution="full" is more Pythonic
    2. pygmt.binstats, statistic="z" is not readable. statstic="sum" is more Pythonic.

    To support Pythonic long-form arguments, we can use a dictionary which maps long-form arguments to short-form arguments. In the current alias system, it means a lot of coding effort, see #3012 and #3013.

  3. Abuse of the kwargs parameter.

    Short-form GMT options are stored in the keyword argument kwargs, so it must be the last parameter for all wrappers that use the alias system.

  4. Can't access the original argument by the long-form parameter name inside the wrappers

    The alias system is implemented as decorators, so all conversions/mappings are done outside of the wrappers. It means we can't access the original argument by the long-form parameter name in the wrappers.

    For example, in Figure.plot, S is aliased to style. To access the argument of style, we have to use kwargs.get("S").

    Another example is, region=[10, 20, 30, 40] is converted to kwargs["R"]="10/20/30/40". If we want to get the region bounds in the wrapper, we have to do the inversed conversion: w, e, s, n = kwargs["R"].split("/").

  5. Difficult to implement Pythonic high-level wrappers

    Due to the design of the GMT modules, each GMT module usually does too man things. For example, basemap/coast provide exactly the same option for adding scale bar, direction rose, and magnetic rose. In #2831, we proposed to provide high-level wrappers that do a single job. These high-level wrappers should have a Pythonic interface with many long-form parameters (see #2831 for the proposed API) but it's unclear how to translate so many parameters into GMT short-form options (we can but it usually means a lot of if-else tests, e.g., #2130).

    Another related issue is #2797 for high-level wrappers of plot and plot3d.

The new alias system

Here, I propose a new alias system after half a year of design and coding (design takes more time than coding!). The new alias system is implemented in pygmt/alias.py of PR #3238.

The Alias class

The Alias class defines how to convert the argument of a long-form parameter name to a string (or a sequence of strings) that can be passed to GMT API.

In the example below, we define a parameter offset. Its value can be a number, a string, or a sequence, or any object that the string representation (__str__) makes sense to GMT. If a sequence is given, the sequence will be joined into a string by the separator '/'. The prefix +o will also be added at the beginning of the string.

>>> from pygmt.alias import Alias
>>> par = Alias("offset", prefix="+o", separator="/")
>>> par.value = (2.0, 2.0)
>>> par.value
'+o2.0/2.0'

The Alias class has the value property, which is implemented using the setter method. So the argument is converted when Alias.value is assigned.

Here are more examples:

>>> from pygmt.alias import Alias
>>> par = Alias("frame")
>>> par.value = ("xaf", "yaf", "WSen")
>>> par.value
['xaf', 'yaf', 'WSen']

>>> par = Alias("resolution", mapping=True)
>>> par.value = "full"
>>> par.value
'f'

>>> par = Alias("statistic", mapping={"mean": "a", "mad": "d", "rms": "r", "sum": "z"})
>>> par.value = "mean"
>>> par.value
'a'

The AliasSystem class

The AliasSystem class is similar to the old use_alias decorator, which aliases GMT single-letter options to a Alias object or a list of Alias objectsn.

Here is an example:

>>> def func(par0, par1=None, par2=None, par3=None, par4=None, frame=False, panel=None, **kwargs):
...     alias = AliasSystem(
...         A=[
...             Alias("par1"),
...             Alias("par2", prefix="+j"),
...             Alias("par3", prefix="+o", separator="/"),
...         ],
...         B=Alias("frame"),
...         c=Alias("panel", separator=","),
...     )
...     return build_arg_list(alias.kwdict)
...
>>> func("infile", par1="mytext", par3=(12, 12), frame=True, panel=(1, 2), J="X10c/10c")
['-Amytext+o12/12', '-B', '-JX10c/10c', '-c1,2']

In this example, A is mapped to a list of Alias objesct. So, arguments of par1/par2/par3 will be used to build the -A option (e.g., par1="mytext", par3=(12, 12) is converted to kwdict["A"]="mytext+o12/12"). It means now we can break any complicated GMT option into multiple long-form parameters.

The AliasSystem class provides the property kwdict which is a dictionary with single-letter options as keys and string/sequence as values. It can be passed directly to the build_arg_list function. The kwdict dictionary is dynamically calculated from the current values of long-form parameters. In this way, we can always access the original values of parameters by long-form parameter names and even make changes to them before accessing alias.kwdict property.

The BaseParam class for common parameters

As discussed in #1082, for some options, it makes more sense to define a class to avoid having too many (potentially conflicting) parameter names.

With the help of the Alias system, the BaseParam implementation is easy. Users won't use the BaseParam class but we developers can use it to create new classes in a few lines without much coding effort (So adding new classes can be marked as "good-first-issue"!).

The Box class

In pygmt/params/box.py, I've implemented the Box class as an example. The box parameter is commonly used for plotting scale bar, color bar, gmt logo, images, inset, and more. So it makes sense to have a Box class.

Below is the definition of the Box class. To define a class for a parameter, we just need to define some fields (e.g., clearance/fill), and the special field _aliases, which is a list of Alias object.

@dataclass(repr=False)
class Box(BaseParam):
    """
    Docstrings.
    """

    clearance: float | str | Sequence[float | str] | None = None
    fill: str | None = None
    innerborder: str | Sequence | None = None
    pen: str | None = None
    radius: float | bool | None = False
    shading: str | Sequence | None = None

    _aliases: ClassVar = [
        Alias("clearance", prefix="+c", separator="/"),
        Alias("fill", prefix="+g"),
        Alias("innerborder", prefix="+i", separator="/"),
        Alias("pen", prefix="+p"),
        Alias("radius", prefix="+r"),
        Alias("shading", prefix="+s", separator="/"),
    ]

Here is an example. Please refer to the docstrings for more examples.

>>> str(Box(clearance=(0.1, 0.2, 0.3, 0.4), pen="blue", radius="10p"))
'+c0.1/0.2/0.3/0.4+pblue+r10p'

It's important to know that the Box class supports autocompletion!

The Frame/Axes/Axis classes

The -B option is one of the most complicated GMT options. It can repeat multiple times in GMT CLI, making it more complicated to support in Python.

In pygmt/params/frame.py, the Frame/Axes/Axis classes are implemented to address one of our oldest issues #249.

The technical details don't matter much. Here is an example use:

>>> import pygmt
>>> from pygmt.params import Frame, Axes, Axis

>>> fig = pygmt.Figure()
>>> # define a Frame object
>>> frame = Frame(
...     axes=Axes("WSen", title="My Plot Title", fill="lightred"),
...     xaxis=Axis(10, angle=30, label="X axis", unit="km"),
...     yaxis=Axis(20, label="Y axis")
... )
>>> fig.basemap(region=[0, 80, -30, 30], projection="X10c", frame=frame)
>>> fig.show()

Check out PR #3238 and try it yourself! Enjoy autocompletion!

Pros/Cons of the new alias system

Pros:

  1. The new and old alias systems can co-exist. So we don't have to migrate all wrapper in a single PR.
  2. Allow building a GMT option argument from multiple PyGMT parameters (More Pythonic)
  3. No abuse of kwargs anymore
  4. Define new parameter classes in a simple way
  5. Access the original argument by parameter name, not by dict lookup like kwargs.get("S") (Maybe faster)
  6. Autocompletion for parameter classes like Box/Frame
  7. Autocompletion of all function parameters after #2896.
  8. Autocompletion for long-form arguments if we add type hints.

Cons:

  1. Big refactors may introduce new bugs. [We can always fix them if any.]
  2. The placeholder {aliases} in docstrings is not supported in the new alias system. [The list of aliases are not needed if we write good documentation.]

This is another big refactor towards a Pythonic interface! Ping @GenericMappingTools/pygmt-maintainers for comments.

Ping @GenericMappingTools/pygmt-maintainers for comments and thoughts.

Thanks @seisman for opening up this for discussion. The Alias class you've implemented in #3238 seems to be meant for internal use (as a replacement for @use_alias), rather than something user-facing? I do like point 5 (Access the original argument by parameter name), which would help with simplifying the makeup of internal functions (especially high level functions in the pipeline), and moving away from @ decorators means users will see a cleaner traceback on errors.

I'll need more time to look into your implementation at #3238. My initial impression is that the implementation of Alias could be done as a first step in one PR, followed by the implementation of the Param class. I'm also wondering if this is a good time to bring in Pydantic to help with some validation logic based on type hints, essentially making syntax errors appear on the Python level rather than the GMT level (though that means PyGMT will need re-implement a lot of GMT's internal validation logic).

The Alias class you've implemented in #3238 seems to be meant for internal use (as a replacement for @use_alias), rather than something user-facing?

Yes.

I'm also wondering if this is a good time to bring in Pydantic to help with some validation logic based on type hints, essentially making syntax errors appear on the Python level rather than the GMT level

It looks worth a try.