pylint-dev/pylint

E0401 (import-error) checks perform repeated _has_init and stat calls

correctmost opened this issue · 0 comments

Bug description

In astroid, there's a _has_init function that looks for the presence of __init__.pyi, __init__.py, and other __init__.* files in a directory.

https://github.com/pylint-dev/astroid/blob/098438683cac8d53e67be75856d7d7aab446bb49/astroid/modutils.py#L669-L678

This function is called repeatedly with the same directory arguments. When running pylint on the yt-dlp codebase, _has_init ends up performing ~43,000 stats, almost all of which are redundant.

Applying a cache to the function brings the number of stats down to ~80 and reduces execution time by ~300ms (~34.1secs -> ~33.8secs).

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone https://github.com/yt-dlp/yt-dlp.git
cd yt-dlp
git checkout 5904853ae5788509fdc4892cb7ecdfa9ae7f78e6

cat << EOF > ./profile_pylint.py
import cProfile
import pstats
import sys

sys.argv = ['pylint', '--recursive=y', '.']
cProfile.run('from pylint import __main__', filename='stats')

with open('profiler_stats', 'w', encoding='utf-8') as file:
    stats = pstats.Stats('stats', stream=file)
    stats.sort_stats('tottime')
    stats.print_stats()
EOF

cat << EOF > .pylintrc
[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no
EOF

python ./profile_pylint.py

Analysis

_has_init calls exists ~43,000 times

import pstats

stats = pstats.Stats('stats')
stats.print_callees('_has_init')

Function                             called...
                                            ncalls  tottime  cumtime
astroid/modutils.py:669(_has_init)      ->   42696    0.039    0.236  <frozen genericpath>:16(exists)
                                             21348    0.051    0.086  <frozen posixpath>:71(join)

Pylint output

There may be some import errors depending on your (virtual) environment, but the output is less important than the performance numbers.

Expected behavior

Improved performance via reduced _has_init and stat calls

Pylint version

astroid @ git+https://github.com/pylint-dev/astroid.git@2c38c0275b790265ab450b79e8dc602e651ca9d3
pylint @ git+https://github.com/pylint-dev/pylint.git@7521eb1dc6ac89fcf1763bee879d1207a87ddefa
Python 3.12.3

OS / Environment

Arch Linux

Additional dependencies

No response