Improve `inferred` interface

Question

Improve `inferred` interface

ostr00000 opened this issue 2 years ago · 7 comments

Steps to reproduce

Consider following code:

node = extract_node(
        """
if hasattr(typing, 'some_future_function'):
    future_function = typing.some_future_function
else:
    def future_function(arg: int) -> bool:
        return False
future_function
    """
)
inferred_any = node.inferred()[0]
assert inferred_any is not util.Uninferable

Current behavior

This code fail, because it inferred function return inferred values in order of its occurrence in analyzed code.

Expected behavior

A NodeNG interface should have a method to take only one best inferred value.

Considered changes

If function inferred find any non Uninferable value, then remove all Uninferable from returning list:
- probably many tests will fail,
- maybe sometimes we want to know about uncertainty inferred value,
In function inferred before returning, we can sort values with most "valuable" values at the beginning:
- some tests fail, because previously there wasn't such assumption about order,
Add another method def inferred_best(self) -> InferenceResult to NodeNG, where it returns single inferred value, possibly avoid Uninferable.

I think the 3rd option is the best and I prepared some code, but before creating pull request I want to know others opinions.

Rationale

It is important for other functionality (especially in pylint-dev/pylint#7565 - where inferred value is considered as Uninferable despite having another better value).
inferred()[0] is a frequent code (about 60 occurrences in astroid and pylint), so it seems reasonable to replace it with inferred_best() - I tested such a change locally, and it seems ok.

`python -c "from astroid import pkginfo; print(pkginfo.version)"` output

5.16.0dev0

Answer 1 · 2023-03-26T08:09:52.000Z

I think this make sense if we want to decreases false positives in pylint. It should not help too much because we're supposed not to raise when we can't infer already. When you say you tested did you mean you ran pylint's tests with the modified code ? I would be very surprised if it did not lead at least to some behavior changes.

Answer 2 · 2023-03-26T11:35:02.000Z

When you say you tested did you mean you ran pylint's tests with the modified code ?

Yes, pylint and astroid test. I created PR to run tests in astroid.
In pylint there are only 2 possible replacement locations - one of them is the code used by mentioned pylint issue.

Answer 3 · 2023-03-26T21:18:31.000Z

I'm not sure if I agree with the proposal. It feels like we are hiding results with magic that makes the behaviour less explicable.
Uninferables aren't incorrect or useless, they show that the inference "engine" has genuine difficulty with understanding the code. I think using this in pylint only increases the risk of creating false positives as something that is now indeterminable would be considered determined by just taking part of the inference result.

I don't think we should support this a public API.

Answer 4 · 2023-03-26T23:56:25.000Z

Well, I am ok if you reject it. But the problem still remains.

In a Python typing system, the expression future_function from "Steps to reproduce" in this issue is Union[Any, Callable[[int], bool]]. Currently, when we are considering the result of inferred, we take only Any type. In the ideal world, we should consider all possible union types. I do not understand why we take inferred()[0] instead of for example inferred()[-1]? - this is what I would call It feels like we are hiding results with magic.
For me, the result is "random" - if we swap if-branches order in user code, we got totally different result.

Of course, we can still keep the result of such inference randomly, but then issues similar to pylint-dev/pylint#7565 are unable to resolve.
Prioritizing any "know type" over Uninferable is in my opinion the most desired behavior. Alternatively, maybe we should choose Uninferable over "know type", to make this function more deterministic?

The first change that I considered while solving pylint-dev/pylint#7565 was to make this sorting right here. But then I thought: "wait, that should be in interface of NodeNG", so here is this issue.
I agree that Uninferables aren't incorrect or useless. We have information whenever the type is fully/partially inferable from inferred function. But the real use case of replaced calls in this PR is to get something better than Any/Uninferable.
Actually, now when I considered this typing topic deeper, I think that in the ideal world, instead of using inferred()[0], we should use a loop over all possible values. Obviously, this is totally impractical (both for library code and from the user point of view). This is the reason why Union is not allowed as a return type in typeshed (typeshed convention: avoid union return types). Additional explanation:

an operation is okay for a union type if it is valid for all items of the union
python/mypy#1693 (comment)

Summing up:

We should decide about determinism of inferred (+ maybe some comment in docs?),
Is function inferred_best improve the usefulness of pylint or make it worse?
How the original issue that I was going to fix pylint-dev/pylint#7565 should be solved if this PR will be rejected?

Answer 5 · 2023-03-28T06:25:11.000Z

Well, I am ok if you reject it. But the problem still remains.

In a Python typing system, the expression future_function from "Steps to reproduce" in this issue is Union[Any, Callable[[int], bool]]. Currently, when we are considering the result of inferred, we take only Any type. In the ideal world, we should consider all possible union types. I do not understand why we take inferred()[0] instead of for example inferred()[-1]? - this is what I would call It feels like we are hiding results with magic. For me, the result is "random" - if we swap if-branches order in user code, we got totally different result.

This is why we have safe_infer. Ideally we would always check all results of infer() or use safe_infer to make sure we are using the "full" result and not [0] or [-1].
I know there are still cases in the code where we don't do this, but those should ideally be refactored.

Of course, we can still keep the result of such inference randomly, but then issues similar to PyCQA/pylint#7565 are unable to resolve. Prioritizing any "know type" over Uninferable is in my opinion the most desired behavior. Alternatively, maybe we should choose Uninferable over "know type", to make this function more deterministic?

I think we shouldn't prioritize anything. We should either check that all results are of the same type or indicate that there is ambiguity. See safe_infer as mentioned above.

The first change that I considered while solving PyCQA/pylint#7565 was to make this sorting right here. But then I thought: "wait, that should be in interface of NodeNG", so here is this issue. I agree that Uninferables aren't incorrect or useless. We have information whenever the type is fully/partially inferable from inferred function. But the real use case of replaced calls in this PR is to get something better than Any/Uninferable. Actually, now when I considered this typing topic deeper, I think that in the ideal world, instead of using inferred()[0], we should use a loop over all possible values. Obviously, this is totally impractical (both for library code and from the user point of view). This is the reason why Union is not allowed as a return type in typeshed (typeshed convention: avoid union return types). Additional explanation:

an operation is okay for a union type if it is valid for all items of the union
python/mypy#1693 (comment)

That explanation of not using Union is irrelevant here and we shouldn't care too much about typing anyway. pylint doesn't concern itself with type annotations but only with what it can infer statically.

Summing up:

1. We should decide about determinism of `inferred` (+ maybe some comment in docs?),

As said above, let's not do determinism but allow the user/library to determine the degree of determinism they want. astroid/pylint can't get that degree right 100% of the time so we should not try to do so.

2. Is function `inferred_best` improve the usefulness of pylint or make it worse?

Use safe_infer or consider all results of infer()

3. How the original issue that I was going to fix [typing_extensions assert_never's Never not supported pylint#7565](https://github.com/PyCQA/pylint/issues/7565) should be solved if this PR will be rejected?

This is a different issue again. It is expected that code that isn't available on Python 3.10 will show up as Uninferable. We don't maintain a list of things that are added in different version so that you can still lint Python 3.10 code with a 3.7 interpreter. The suggested fix should be applied so that pylint ran with 3.11 behaves correctly, but we don't do "backwards compatibility" of linting new features on older interpreters (that would be way too much work to maintain).

Answer 6 · 2023-03-28T19:00:48.000Z

maybe we should choose Uninferable over "know type"

This is why we have safe_infer.

It seems that this is what I overlooked.

pylint doesn't concern itself with type annotations but only with what it can infer statically.

Yes, I know it, but the inferred types (union of types) are still subject to typing laws, but I see you are aware of it (Ideally we would always (...) make sure we are using the "full" result and not [0] or [-1]).

It is expected that code that isn't available on Python 3.10 will show up as Uninferable

It makes sense. Although, from this sentence it can be concluded that pylint will never support a "compatibility" library. By compatibility, I mean a lot of if-statements with hasattr, import redirections and alternative declarations (ex. typing_extensions).
The only code without errors is when we use the newest version of a redirected library (ex. typing), which make the compatibility library a bit useless.

Anyway, I am closing this issue along with PR, since this I have no more concerns. Thanks for a clean explanation!

Answer 7 · 2023-03-28T19:04:00.000Z

It makes sense. Although, from this sentence it can be concluded that pylint will never support a "compatibility" library. By compatibility, I mean a lot of if-statements with hasattr, import redirections and alternative declarations (ex. typing_extensions). The only code without errors is when we use the newest version of a redirected library (ex. typing), which make the compatibility library a bit useless.

You could look into using py-version. We try to maintain backwards compatibility for newer interpreters for old versions. So, we warn when you use something that is not available in the lower bound version you set with py-version. It also has its flaws and is not implemented for 100% of the checks (although the most important ones are), but it might help out here!

Thanks for a clean explanation!

Thanks for the compliment! And thank you for opening the issue and engaging in a meaningful discussion. Even though we haven't helped you directly it is also good for us to see how people use pylint and what limitations they run into.