Make astroid typable possibily with breaking change to permit most performance increase possibility
Pierre-Sassoulas opened this issue · 3 comments
astroid is very dynamic and very hard to type. A lot of effort went into typing it, and mypy is still not activated (#1287). It prevents automated error detection in pylint and it prevent most performance possibility that always start with proper typing, be it mypyc, a partial rewrite in a rust crate (#2014) or cython.
I'm opening this discussion so we can discuss the breaking changes we need to do to make it easier to type astroid in the future.
Many node types have fields that are marked as optional when really they are required. These fields are assumed to be non-optional throughout the Astroid code, and that's a major source of type errors detected by Mypy.
For example, consider the Assert
node. It has these fields:
self.test: NodeNG | None = None
"""The test that passes or fails the assertion."""
self.fail: NodeNG | None = None # can be None
"""The message shown when the assertion fails."""
An assert
statement without a test is not valid syntax, so the test
field needs to be required. On the other hand, the fail message really is optional, so the fail
field really is optional.
A symptom of the problem is the frequent appearance of # can be None
comments. These show that the existing type annotations are not reliable.
Now, the reason AFAICT that these fields are all marked as optional is that the type annotations are declared in __init__
but the values for those fields are not available until postinit
is called. There is an easy solution to this: declare the types outside of __init__
. This has the happy side effect of making it possible to get rid of __init__
definitions altogether for many nodes, since all they do besides declaring types is calling super().__init__
.
Now, the reason AFAICT that these fields are all marked as optional is that the type annotations are declared in
__init__
but the values for those fields are not available untilpostinit
is called.
This is exactly why. When I was first defining the types of everything all the way back when I added docstrings to everything, I had to make everything optional because of how __init__
and postinit
worked.
There is an easy solution to this: declare the types outside of
__init__
. This has the happy side effect of making it possible to get rid of__init__
definitions altogether for many nodes, since all they do besides declaring types is callingsuper().__init__
.
We'd risk the potential for some of the attributes to be accidentally left unset by the postinit
if we took this approach. If we think that's fine, then this seems like the easier solution.
Personally I'd want to be looking to eliminate the postinit
methods where possible. For many node types we create a node and then immediately call the postinit
. This wouldn't be possible for nodes where parents need to store mandatory references to the children, and children need to store mandatory references to the parents.
For those nod types, we could annotate those fields as mandatory and deal with mypy complaining about them being unset internally. It wouldn't allow us to use cython, but it would make the type annotations be correct for users at least. We could minimise this being an issue if, for example, we always created child nodes first so that only the parent
attribute is temporarily left unset.