pylint-dev/pylint

R1707 (trailing-comma-tuple) checks perform excessive is_message_enabled calls

correctmost opened this issue · 3 comments

Bug description

RefactoringChecker::process_tokens has a loop with an embedded self.linter.is_message_enabled("trailing-comma-tuple") call:

for index, token in enumerate(tokens):
token_string = token[1]
if token_string == "elif":
# AST exists by the time process_tokens is called, so
# it's safe to assume tokens[index+1] exists.
# tokens[index+1][2] is the elif's position as
# reported by CPython and PyPy,
# token[2] is the actual position and also is
# reported by IronPython.
self._elifs.extend([token[2], tokens[index + 1][2]])
elif self.linter.is_message_enabled(
"trailing-comma-tuple"
) and _is_trailing_comma(tokens, index):
self.add_message("trailing-comma-tuple", line=token.start[0])

This call gets executed ~1.4 million times when running pylint on the yt-dlp repo. Hoisting the is_message_enabled call outside of the loop can bring the number of executions down to ~1100.

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=R1707

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone https://github.com/yt-dlp/yt-dlp.git
cd yt-dlp
git checkout 5904853ae5788509fdc4892cb7ecdfa9ae7f78e6

cat << EOF > ./profile_pylint.py
import cProfile
import pstats
import sys

sys.argv = ['pylint', '--recursive=y', '.']
cProfile.run('from pylint import __main__', filename='stats')

with open('profiler_stats', 'w', encoding='utf-8') as file:
    stats = pstats.Stats('stats', stream=file)
    stats.sort_stats('tottime')
    stats.print_stats()
EOF

cat << EOF > .pylintrc
[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=R1707

[REPORTS]
reports=no
score=no
EOF

python ./profile_pylint.py

Analysis

process_tokens calls is_message_enabled ~1.4 million times:

import pstats

stats = pstats.Stats('stats')
stats.print_callers('is_message_enabled')

 ncalls  tottime  cumtime
1448094    1.515    3.509  pylint/checkers/refactoring/refactoring_checker.py:650(process_tokens)

Pylint output

There are some R1707 errors, but the output is less important than the performance numbers.

Expected behavior

Improved performance via reduced is_message_enabled calls

Pylint version

astroid @ git+https://github.com/pylint-dev/astroid.git@0ccc2e29d4b9ce0f54f9e50fa6df85522083c5de
pylint @ git+https://github.com/pylint-dev/pylint.git@7521eb1dc6ac89fcf1763bee879d1207a87ddefa
Python 3.12.3

OS / Environment

Arch Linux

Additional dependencies

No response

Wow, this one is shocking, should have been a low hanging fruits during review. Thanks again, this is very valuable.

And in a performance MR of all thing 😄 #8606

Wait, no, we kinda need to do that on each line (not each token though, and we don't need to recheck everything). The enabling / disabling can be per line.