python/mypy

Inconsistent type checking between first and subsequent executions (possibly due to `SyntaxWarning: invalid escape sequence`)

bluenote10 opened this issue · 1 comments

Bug Report

The mypy type check gives different results between the first and subsequent executions in the following example.

To Reproduce

This is best reproduced by installing an external dependency, whose code base contains invalid escape sequences. I'm using ray in this case (upstream issue is ray-project/ray#48921).

  1. Create a venv.
  2. pip install mypy ray==2.39.0
  3. Create the following example.py
import ray

dummy = None
ray.data.from_huggingface(dummy)
  1. Run mypy example.py twice.

Expected Behavior

The first and second (and all subsequent) executions of mypy should have the same type checking result.

Actual Behavior

The type checking result differs between the first and subsequent runs.

First execution:

$ mypy example.py
example.py:4: error: Module has no attribute "from_huggingface"  [attr-defined]
Found 1 error in 1 file (checked 1 source file)

Subsequent executions:

$ mypy example.py
Success: no issues found in 1 source file

Additional observations:

  • Removing the .mypy_cache folder essentially resets the behavior, i.e., the type check would fail again.
  • When enabling "unused ignore" checking and putting a # type: ignore on that line, the behavior just flips, i.e., the first execution passes because mypy seems to require that ignore, but the subsequent runs now fail, because mypy doesn't want the ignore any more.

In my original reproduction in ray-project/ray#48921, mypy actually produced further output in the first/failing type check, hinting at a possible source of the problem:

/Users/.../lib/python3.12/site-packages/ray/data/grouped_data.py:350: SyntaxWarning: invalid escape sequence '\ '
  """Compute grouped min aggregation.
/Users/.../lib/python3.12/site-packages/ray/data/grouped_data.py:389: SyntaxWarning: invalid escape sequence '\ '
  """Compute grouped max aggregation.
/Users/.../lib/python3.12/site-packages/ray/data/grouped_data.py:428: SyntaxWarning: invalid escape sequence '\ '
  """Compute grouped mean aggregation.
/Users/.../lib/python3.12/site-packages/ray/data/grouped_data.py:470: SyntaxWarning: invalid escape sequence '\ '
  """Compute grouped standard deviation aggregation.

The ray code base indeed has these malformed escape sequences, which seem to cause a this hiccup in mypy. I'm not entirely sure why I'm not seeing these additional warnings now in the minimal reproduction -- they seem to be a bit non-deterministic.

Your Environment

  • Mypy version used: 1.13.0
  • Mypy command-line flags: none
  • Mypy configuration options from mypy.ini (and other config files): none, as discussed above
  • Python version used: 3.10 and 3.12

I took a quick look into this. From what I can tell, the gist is that during the first run, mypy doesn't know that ray.data is a module and treats ray.data.from_huggingface as a non-module attribute access, while during subsequent runs it recognizes that ray.data is a module and processes the attribute access differently. (specifically, this branch is taken differently in the first vs subsequent runs).

I'm not sure what the reason for that is, but it's likely related to mypy not understanding the way that ray dynamically loads the ray.data submodule. In particular, the fact that ray (by design) never directly imports ray.data, but still lists data in its __all__.

This issue disappears if ray.data gets explicitly imported somewhere. For example, if you change the import in example.py to import ray.data, things will work as expected.

Probably a good way to improve things on ray's side would be addding some type-checking-only imports for its dynamically loaded submodules. That would help mypy (and perhaps other tools) understand how accessing ray.data behaves at runtime. Something like

# ray/__init__.py
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    import ray.data

Testing that locally, it seems to make everything work as expected.

(As far as I can tell, the syntax warnings are not directly related).