Inconsistent type checking between first and subsequent executions (possibly due to `SyntaxWarning: invalid escape sequence`)
bluenote10 opened this issue · 1 comments
Bug Report
The mypy type check gives different results between the first and subsequent executions in the following example.
To Reproduce
This is best reproduced by installing an external dependency, whose code base contains invalid escape sequences. I'm using ray
in this case (upstream issue is ray-project/ray#48921).
- Create a venv.
pip install mypy ray==2.39.0
- Create the following
example.py
import ray
dummy = None
ray.data.from_huggingface(dummy)
- Run
mypy example.py
twice.
Expected Behavior
The first and second (and all subsequent) executions of mypy should have the same type checking result.
Actual Behavior
The type checking result differs between the first and subsequent runs.
First execution:
$ mypy example.py
example.py:4: error: Module has no attribute "from_huggingface" [attr-defined]
Found 1 error in 1 file (checked 1 source file)
Subsequent executions:
$ mypy example.py
Success: no issues found in 1 source file
Additional observations:
- Removing the
.mypy_cache
folder essentially resets the behavior, i.e., the type check would fail again. - When enabling "unused ignore" checking and putting a
# type: ignore
on that line, the behavior just flips, i.e., the first execution passes because mypy seems to require that ignore, but the subsequent runs now fail, because mypy doesn't want the ignore any more.
In my original reproduction in ray-project/ray#48921, mypy actually produced further output in the first/failing type check, hinting at a possible source of the problem:
/Users/.../lib/python3.12/site-packages/ray/data/grouped_data.py:350: SyntaxWarning: invalid escape sequence '\ '
"""Compute grouped min aggregation.
/Users/.../lib/python3.12/site-packages/ray/data/grouped_data.py:389: SyntaxWarning: invalid escape sequence '\ '
"""Compute grouped max aggregation.
/Users/.../lib/python3.12/site-packages/ray/data/grouped_data.py:428: SyntaxWarning: invalid escape sequence '\ '
"""Compute grouped mean aggregation.
/Users/.../lib/python3.12/site-packages/ray/data/grouped_data.py:470: SyntaxWarning: invalid escape sequence '\ '
"""Compute grouped standard deviation aggregation.
The ray code base indeed has these malformed escape sequences, which seem to cause a this hiccup in mypy. I'm not entirely sure why I'm not seeing these additional warnings now in the minimal reproduction -- they seem to be a bit non-deterministic.
Your Environment
- Mypy version used: 1.13.0
- Mypy command-line flags: none
- Mypy configuration options from
mypy.ini
(and other config files): none, as discussed above - Python version used: 3.10 and 3.12
I took a quick look into this. From what I can tell, the gist is that during the first run, mypy doesn't know that ray.data
is a module and treats ray.data.from_huggingface
as a non-module attribute access, while during subsequent runs it recognizes that ray.data
is a module and processes the attribute access differently. (specifically, this branch is taken differently in the first vs subsequent runs).
I'm not sure what the reason for that is, but it's likely related to mypy not understanding the way that ray
dynamically loads the ray.data
submodule. In particular, the fact that ray
(by design) never directly imports ray.data
, but still lists data
in its __all__
.
This issue disappears if ray.data
gets explicitly imported somewhere. For example, if you change the import in example.py
to import ray.data
, things will work as expected.
Probably a good way to improve things on ray's side would be addding some type-checking-only imports for its dynamically loaded submodules. That would help mypy (and perhaps other tools) understand how accessing ray.data
behaves at runtime. Something like
# ray/__init__.py
from typing import TYPE_CHECKING
if TYPE_CHECKING:
import ray.data
Testing that locally, it seems to make everything work as expected.
(As far as I can tell, the syntax warnings are not directly related).