pylint-dev/astroid

regression introduced in 2.13.0 (from 2.12.14) concerning `use_cache` in astroid's cache

ltcmelo opened this issue · 4 comments

This PR (#1747) introduced a regression in 2.13.0 (from 2.12.14). This regression causes undesired side effects — at least to me :-).

I rely on AstroidManager.astroid_cache to provide "special" implementations for certain modules. But the change in the PR mentioned above affects the behavior of AstroidManager.astroid_cache in a way that I can no longer ensure that those implementations are consistently used.

(And it appears that this side effect isn't intended by the PR, since the importing module isn't the same name as the imported module.)

Reproducible steps:

Under path /user/projects/

File /user/projects/main.py

import os
import os.path

import astroid
from astroid.manager import AstroidManager

mngr = AstroidManager()

app_dir_path = os.path.dirname(__file__)


def module_name_from_path(file_path: str):
    subpath = os.path.relpath(file_path, app_dir_path)
    mod_name = os.path.splitext(subpath)[0].replace(os.sep, '.')
    return mod_name


def parse_and_cache(mod_file_path: str, mod_name: str):
    with open(mod_file_path, 'r') as f:
        code = f.read()
    mod_node = astroid.parse(code, mod_name, mod_file_path, True)
    if mod_name in mngr.astroid_cache:
        del mngr.astroid_cache[mod_name]
    mngr.cache_module(mod_node)
    return mod_node


if __name__ == '__main__':
    prog_mod_path = os.path.abspath(os.path.join(app_dir_path, 'prog.py'))
    prog_mod_name = module_name_from_path(prog_mod_path)
    prog_mod_node = parse_and_cache(prog_mod_path, prog_mod_name)

    lib_dir_path = '/external/repository'
    os_dir_path = os.path.abspath(os.path.join(lib_dir_path, 'os', '__init__.py'))
    parse_and_cache(os_dir_path, 'os')
    ospath_mod_path = os.path.abspath(os.path.join(lib_dir_path, 'os', 'path.py'))
    parse_and_cache(ospath_mod_path, 'os.path')

    val = next(prog_mod_node.body[1].body[0].value.func.infer())
    print(val.parent.frame().file)

File /user/projects/proj.py:

import os.path

def f():
    os.path.relpath('abc')

Under path /external/repository/

This path must not be a subpath of /user/projects/.
Directory structure below:

$ tree /external/repository
/os
├── __init__.py
└── path.py

File /external/repository/os/init.py:

from . import path as path

File /external/repository/os/path.py:

def relpath(s):
    return " "

Output of run

With:

  • astroid 2.12.14 (this is correct/expected "special" os.path)
    /external/repository/os/path.py

  • astroid 2.13.0 (the system os.path)
    /usr/local/Cellar/python/…/lib/python3.10/posixpath.py

Details

I can observe the bug in do_import_module when modname is None and also self.modname is None, which in turn leads to use_cache = False. For instance, if you amend if mymodule.relative_to_absolute_name(modname, level) == mymodule.name: with if modname and mymodule.relative_to_absolute_name(modname, level) == mymodule.name:, then the bug is gone; but I'm not an astroid developer and I imagine that isn't the right place to fix the issue… it probably has a deeper root cause.

Sorry for taking so long to get back to this. I think it is very likely that we might refactor AstroidManager in our efforts to fix our multiprocessing issues. That is why I'm inclined to leave this issue for now and see if it still exists after the refactor and what our stance would then be.

Ended up in the same place after opening #2124. I do think we should reconsider this as part of 3.0. We may look into multiprocessing first, or we may look at this part first, I'm not sure.

We may look into multiprocessing first, or we may look at this part first, I'm not sure.

Caching the sensitive part seems a lot easier to implement than multiprocessing ? I imagine it as "making sure that we can cache a function because the value returned for a particular input is always the same then adding a decorator on a function" vs "making sure that the design we have to invent can merge the result of two parallel parsing then actually implement it" ?

For instance, if you amend if mymodule.relative_to_absolute_name(modname, level) == mymodule.name: with if modname and mymodule.relative_to_absolute_name(modname, level) == mymodule.name:, then the bug is gone; but I'm not an astroid developer and I imagine that isn't the right place to fix the issue… it probably has a deeper root cause.

I just independently arrived at the same patch. I think it's correct. #1747 didn't intend to create endless re-instantiation of modules as described in #2124. I'll open a PR and credit you @ltcmelo. Thanks!