BrianPugh/cyclopts

Shorter validator syntax

mgielda opened this issue ยท 14 comments

Hi @BrianPugh - first of all, hats off for the nice library, I laughed when I read the "what you thought Typer was" line. I am a heavy Typer user (and of course credit where it's due, it's a great lib) but the issues you mention in your comparison bugged me, so I'm glad I stumbled upon cyclopts, and I think I will be switching to it.

I am now playing around with some Path type parameters (which are common with my scripts given that I manipulate files a lot) - and wanted to use validators, but admittedly they blow up the function signature a lot. Did you think about some alternative way to specify it? Like additional annotations for the validated parameters, or perhaps docstrings? Or some other way to reduce the amount of typing involved (pun intended)? So that we could bring the

def foo(path: Annotated[Path, Parameter(validator=validators.Path(exists=True))]):

example closer to

def foo(path: Path):

(the latter being kinda "the original promise of Typer")

I know I know, you don't have to use validators if you do not want to, but they make so much sense instead of implementing your own logic for this every time! If only we could find a way to make them less verbose.

Thanks for the kind words ๐Ÿ˜„ .

Here are some thoughts and solutions:

  1. The current verbose syntax is the "most" correct and all python users and tools would say "that's fine". As in we aren't doing anything "weird".
def foo(path: Annotated[Path, Parameter(validator=validators.Path(exists=True))]):
  1. We could just shorted Annotated and Parameter. Not great, not terrible.
from typing import Annotated as A
from cyclopts import Parameter as P, validators as v

def foo(path: A[Path, P(validator=v.Path(exists=True))]):
  1. Lastly, (perhaps what you want) we could just define a type. Downside of this is that the actual type of the parameter is less clear to someone not familiar with your codebase.
ExistingPath = Annotated[Path, Parameter(validator=validators.Path(exists=True))]

def foo(path: ExistingPath):

If you go with (3), you can add additional Parameter annotations and cyclopts should resolve it properly:

ExistingPath = Annotated[Path, Parameter(validator=validators.Path(exists=True))]

def foo(path: Annotated[ExistingPath, Parameter(name="bar")]):
    # Works as expected, the `validators.Path` is used and the CLI uses `--bar`

If you can think of an alternative syntax, I'm all ears. I don't want to add any true interpreted logic from the docstring; I only want to use the docstring for the help-page (since that's classically the purpose/responsibility of a docstring).

I'm currently working on cyclopts v2 (which is like 99% backwards compatible) that will have:

  • command and parameter groups (for help-page organization and inter-parameter validation logic).
  • Improved parsing defaults for more intuitive error messages.

This is to say, I'm not overly concerned with a new syntax that has to be 100% backwards compatible if it brings significant value.

Wow, thanks for the instant reply :) Let me comment keeping your original numbering:

  1. Yeah, I know we aren't doing anything weird, just wordy, which is sometimes fine. Typer made me kind of lose hope in being able to keep my function signatures short anyway so this did not stand out, but cyclopts indeed lets me keep them short, barring this one thing.

  2. Not terrible indeed. Not great either, but food for thought!

  3. This actually sounds nice too, with the downside being what you observed.

Docstrings - fair enough, I can see how this would be going in a direction that you might not want.
When I wrote "additonal annotations" above I actually mean decorators, sorry. What's your take on sth like:

from cyclopts import app, validate, validators

@app.command
@validate(path, validators.Path(exists=True))
def foo(path: Path):
    pass

Another thought I had was letting you set some default validators for specific types when setting up the app, and then you only specify the validators if you need them to be different. So you can essentially let all of your paths be validated in a specific way, or all your ints etc.

It's not perfect, but I have the following planned (and partially implemented) for v2. App (and by extension, @app.command) accepts an optional validator callable that gets passed all parsed/converted/validated arguments as keywords; here's the relevant docstring:

    validator: Union[None, Callable, List[Callable]]
        A function (or list of functions) where the CLI-provided group variables of ``default_command`` will be keyword-unpacked, regardless of their positional/keyword-type in the command function signature. The python variable names will be used, which may differ from their CLI names.

        Example usage:

        .. code-block:: python

           def validator(**kwargs):
               "Raise an exception if something is invalid."

        The app-validator runs **last**.
        This parameter is keyword-only.

so for this use-case, you could do something like:

from cyclopts import app, validate, validators

def all_paths_must_exist(**kwargs):
    for value in kwargs.values():
        if isinstance(value, Path) and not value.exists():
            raise ValueError(f"{Path} does not exist.")


@app.command(validator=all_paths_must_exist)
def foo(path: Path):
    pass

What are your thoughts on this solution?

The intention here is to keep as consistent and minimal of an interface as possible between all the cyclopts components. Hopefully this results in intuitive code/usage without having to read a ton of documentation.

ETA for v2 is by the end of the month (January 2024).

Your proposal is indeed interesting. When thinking of potential uses I however thought of common cases like mv (first file must exist, the other probably even should not) where neither my "default validator for specific type app-wide" nor your "blanket validator which validates all Paths" would work. A decorator would however let you differentiate those easily, and here's one more thought:

One thing that keeps haunting me is that when you do stuff like Path validation with annotations, assuming your function is callable both from a cyclopts (or typer etc.) CLI and library-style, the latter use does not get any validation, which eventually might lead to someone adding the validation inside the function anyway (which makes no sense). I am wondering if a decorator could not somehow be used in an advantageous way, essentially also providing validation even when the function is not called from the CLI (since I fail to see how you'd like to have a function that is OK with nonexisting files in a library use and not in CLI use or vice versa).

Appreciate your super fast turnaround here. I was not expecting any meaningful resolution here anytime soon, though of course it would be incredible if v2 makes it easier to do the kind of stuff we are discussing.

Anyway in the little time I spent thinking about this, my heart warms up the most to the decorator concept, though it's not that I have any details figured out, know what I am talking about or have consulted it with anyone internally. Might actually discuss with a bunch of folks more knowledgeable in Python than me, but very interested in your opinion on having an additional decorator in general. I am even fine with not having this feature in mainline cyclopts and instead having a different little package, as long as it's implementable and does not break other stuff (but of course mainline support would be even better).

When thinking of potential uses I however thought of common cases like mv (first file must exist, the other probably even should not)

I think for situations like that, we're not going to get better than the Annotated[Path, Parameter(...)] solution.

assuming your function is callable both from a cyclopts (or typer etc.) CLI and library-style, the latter use does not get any validation, which eventually might lead to someone adding the validation inside the function anyway (which makes no sense).

I've also had this nagging thought and have the following thoughts:

  1. Frequently (but not always!), Typer/Cyclopts functions end up being specialized to the CLI due to inherent invocation differences.
  2. An external library (or project-specific) validation library should be used. I think generic function-argument-validation gets a bit out of scope for Cyclopts. However, if a third party library offered a decorator validate, then it should be inherently compatible with cyclopts:
@app.command
@validate(path, validators.Path(exists=True))  # This comes from a 3rd-party library.
def foo(path: Path):
    pass

A similar tool is pydantic's validate_call decorator. Thinking it through, I guess it wouldn't be a bad idea to have a similar decorator compatible with Cyclopts's Parameter.

I think the python-library-validation solution would have to look like:

@app.command(validate=True)  # This returns a function that validates foo's arguments prior to execution.
def foo(path: Annotated[Path, Parameter(validator=validators.Path(exists=True))]):
    pass

where @app.command(validate=True) returns a wrapped version of foo that performs validation (normally, @app.command returns exactly foo). So in this schema:

from cyclopts import app, validate, validators

def all_paths_must_exist(**kwargs):
    for value in kwargs.values():
        if isinstance(value, Path) and not value.exists():
            raise ValueError(f"{Path} does not exist.")


@app.command(validator=all_paths_must_exist, validate=True)  # Maybe `validate` should default to True?
def foo(path: Path):
    pass
    
foo(Path("this/path/exists"))  # validation is performed

I think this is feasible from the implementation-side. The reason I'm trying to avoid stacking decorators is we would have to store an attribute like __cyclopts__ to the decorated function, and I'm trying kind of hard to leave the function untouched. Also, frequently developers forget the order in which decorators resolve ๐Ÿ˜† .

Let's let this digest for a little bit; I won't be able to work on it until the other slated-features are implemented, anyways. I agree that this would be a useful feature.

Took me deep down Python's decorator rabbit hole, but this seems to work!

#!/usr/bin/env python3

from cyclopts import App, validators
from pathlib import Path

from functools import wraps

from inspect import signature 

def validate(**validators):
    def inner_decorator(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            parameters = {}
            sig_pars = list(signature(f).parameters.items()) 
            for i in range(0, len(args)):
                parameters[sig_pars[i][0]] = {'t': sig_pars[i][1], 'v': args[i]}
            for a in kwargs:
                parameters[a] = {'t': type(kwargs[a]), 'v': kwargs[a]}
            for v in validators:
                if v in parameters:
                    validators[v](parameters[v]['t'], parameters[v]['v'])
            
            return f(*args, **kwargs)
        return wrapper
    return inner_decorator

app = App()

@app.command
@validate(src=validators.Path(exists=True))
@validate(tmp=validators.Path(exists=True))
def mv(src: Path, dst: Path, *, tmp: Path = None):
    if tmp is None:
        print(f"Moving {src} to {dst}.")
    else:
        print(f"Moving {src} to {dst} via {tmp}.")

if __name__ == "__main__":
    app()

Of course I am sure there are bugs, so please point them out if you have ideas. I would love to have this either as a cyclopts capability if it does not break your other assumptions / ideas, or I can make a small separate package that would be used alongside cyclopts out of that. What do you think?

Naturally, that is, unless I am missing something important here, which I may well be. I only had a chance to discuss it with one colleague who said "yup, this should be possible" so I just went on and implemented it.

I assume the actual code would have to be a bit longer, e.g. handle mismatches between the actual argument names and the ones given to the "validate" decorator, but the core logic is there.

Huh! While browsing the web found this - https://www.reddit.com/r/Python/comments/18j41fv/comment/kdjtlpb/ - where essentially the author of Feud is discussing how Feud handles validation better (since it's based on pydantic).

At first glance, reading through the thread, I am not sold on some explanations how Feud is better - e.g. arbitrary choice to use classes for command grouping does not seem like something I enjoy (plac did that, I hated that about plac). It does seem to be interesting to look at, perhaps there are some inspirations on how Pydantic could integrate with Cyclopts for better validation, or in other areas.

Actually the more time I spend comparing the different Python libraries for CLI parsing, the more I feel the "comparison" angle, instead of being included in the docs of specific projects (or on Reddit), should be a neutral repo / website with a matrix of X vs Y with all the major and/or modern tools included, and you just PR against the repo with a fair comparison when you have one, including a date / version nos of when the comparison was made (and people can update it if it becomes obsolete e.g. when they fix a deficiency in their own library) :)

I digress, again. Hopefully this is interesting background and food for thought.

nice work! I almost want to include this, but here's my train of thought:

  1. To minimize the API surface, I think it would be good to make the decorating the __call__ method of Parameter, so it would look like:
@app.command
@Parameter(name="src", validator=validators.Path(exists=True))
@Parameter(name="tmp", validator=validators.Path(exists=True))
def mv(src: Path, dst: Path, *, tmp: Path = None):
    pass

if __name__ == "__main__":
    app()
  1. However, I don't like this because it breaks the Parameter-resolution-order. I.e. in the following example, the converter won't propagate, which isn't intuitive/expected.:
@app.command(default_parameter=Parameter(converter=lambda x, y: Path(y)))
@Parameter(name="src", validator=validators.Path(exists=True))
@Parameter(name="tmp", validator=validators.Path(exists=True))
def mv(src: Path, dst: Path, *, tmp: Path = None):
    pass
  1. This introduces a second, click-like way of using Cyclopts, which may just lead to confusion.

I'd need to mull it over more.

Yeah, I dislike the click-like way and the confusion it brings (also the wordiness). I kind of wanted 'validate' to be seen like a separate functionality specifically for that problem. Obviously we may be missing some bigger picture. I am curious if looking at Feud's use of Pydantic wouldn't yield some good thoughts.

Perhaps the solution is this pydantic + annotated-types. In cyclopt's current form, I think maybe the following "just works." I'll have to play with it later.

@app.command
@pydantic.validate_call
def find_file(path: DirectoryPath, regex: Pattern, max=None) -> Optional[Path]:
    pass

I feel so dumb now, since it seems like you're guyfrom7up on Reddit. Now it all makes sense. So you'd already been chatting.

So having remembered why I am a bit skeptical of Pydantic after re-reading the cattr guy's writeup, I now think that I'll stick with the validate decorator and stuff it somewhere in a small reusable lib to be used alongside cyclopts for now.

I've improved pydantic support in the develop-v2 branch.

See documentation here: https://cyclopts.readthedocs.io/en/develop-v2/pydantic.html

I'll leave this issue open until v2 is released.

I think we're getting close to a v2 release. I added all the changes I had listed, I just want to do some sanity checks and kick the tires a bit more first.

closing with the release of v2. We can reopen this issue, or open a new one, should the need arise.

Pre-defined annotated types have been added to cyclopts, significantly reducing the verbosity of these common situations.

https://cyclopts.readthedocs.io/en/latest/api.html#types