Add a way to ignore the same files as `.gitignore`
leblancfg opened this issue · 9 comments
When using the tool, I need to manually generate an exclusion list parameter that's essentially recreating what my .gitignore
is doing.
Otherwise I end up running the tool on my virtualenv, egg-info, tox, etc. and the initial user experience is that the tool is too slow even on tiny codebases. It would be much simpler to use a flag that tells vulture
to just exclude the same files as in the .gitignore
– a common enough resource in most repos.
One might even go one step further and make that the default – like more modern shell tools like ripgrep
and fd
, but that's likely a separate discussion.
Yes, that sounds reasonable. I'd be happy to review a pull request that uses .gitignore to exclude files if --exclude
is missing from both pyproject.toml and the command line. I suggest to basically copy the feature implementation from black: psf/black#878 with the later patch psf/black#2170
Fixed in #345 .
Wow, that was quick. Thanks for doing this, @whosayn and @jendrikseipp
After further thought, I decided to remove this feature again. The --exclude flag and .gitignore file have different responsibilities and mixing them will surprise some users and will make documentation and maintenance difficult. Also, I'd like Vulture to have no external dependencies, so adding one just for parsing .gitignore files is not ideal. Finally, the feature raised some questions in my mind: why do we only parse the top-level .gitignore file? Why do we parse the .gitignore file in the current directory and not in the directory passed to Vulture?
I should have put the brakes on this feature earlier. Sorry about that!
@jendrikseipp if you don't mind me pushing a bit and seeing if there's possible middle ground, I think I can see a future where these concerns are handled:
The --exclude flag and .gitignore file have different responsibilities and mixing them will surprise some users and will make documentation and maintenance difficult.
By your previous approval here, I was actually expecting this feature to only be merged for the next major version change, clearly indicating to users that there was a breaking change.
It could always be available behind a CLI flag though, and only be opt-in. If users enter both a --exclude
and e.g. --use-gitignore
flags, we emit a warning on STDERR, something like
Both --exclude and --gitignore flags set, only using --exclude patterns.
I'd like Vulture to have no external dependencies
I think vulture
could easily vendor the pathspec library for this purpose. The actual package code is tiny, only a handful of files. IIUC given the pathspec license (Mozilla Public License), that only means adding attribution to the vulture docs if we don't modify pathspec code. Also, it is a mature package so there is less risk of bugs, etc.
Why do we parse the .gitignore file in the current directory and not in the directory passed to Vulture?
IMO this is the default with the other tools out there, and sanely handled in the way you describe. I had a local PR I was intending to push up before #345 merged that handled that, using the same logic as black
's find_project_root
function.
Would you mind if I revived my local branch and gave another stab at this feature, given the above? If so, would you agree with the following?
- We vendor the pathspec dependency
- With a CLI flag called
--gitignore
, users can opt-in to using it as a source of exclusion patterns - If
--exclude
and--gitignore
are given, only useexclude
patterns. - Copy black's
find_project_root
logic to handle the common parent.gitignore
of the relative paths.
Thanks for the detailed proposal! Before we go further here, can you (or others) provide real-world examples of projects where the new feature would be beneficial? I.e., what are example .gitignore files in the wild where all contained gitignore patterns should also be ignored by Vulture?
I see three main patterns that are encoded in .gitignore
:
- "standard" locations for virtual environments, often
.env/
and/or.venv/
build/
,*egg/
anddist/
folders- testing artifacts (coverage, hypothesis, etc.)
I've found that these 5 popular repos have at least 2/3 these folders encoded in the .gitignore
s:
Thanks! But doesn't this suggest that it'd be better to ignore these directories by default, i.e., fill --exclude
with these dirs by default?
However, ignoring these directories assumes that users call Vulture like vulture .
in the project dir. Wouldn't it be much easier to have Vulture analyze the relevant files by passing them on the command line as in vulture src tests
?
But doesn't this suggest that it'd be better to ignore these directories by default, i.e., fill
--exclude
with these dirs by default?
I agree with this statement. It's also the default behaviour for many codebase-focused tools, like black
, ripgrep
, etc. but comes with "change management" maintenance work, like assigning it to the next major version, clear documentation, etc. Two thoughts:
- I prefer your previous suggestion to keep the
--exclude
flag as an override to the.gitignore
patterns, just likeblack
does. Users need a way to override default behaviour. Also limits the breaking changes for users that already have the flag set. - Selfishly as a user, I'd rather have the option to use gitignore as a source of exclusion patterns behind an opt-in flag than not at all.
Ultimately your call here @jendrikseipp, and one we can hopefully discuss in an upcoming PR.
Wouldn't it be much easier to have Vulture analyze the relevant files by passing them on the command line as in
vulture src tests
?
I guess this line of discussion boils down to aesthetics. Gitignores also handle more complex patterns like "exclude setup/
except for setup/main.py
" etc. You could suggest users use something like:
vulture $(git ls-files --exclude-standard --others)
Personally, that's not very elegant or user-friendly – I don't think it's fair to assume users know git
this well. I find it annoying working with tools (e.g. pylint
) that force you to be this verbose to recreate allow/exclude patterns that are already encoded in a file that serves exactly that purpose.