github-linguist/linguist

Reconsidering auto-detecting Yarn files as auto-generated

Closed this issue Β· 28 comments

Preliminary Steps

Please confirm you have...

Problem Description

#3431 added support for generated file detection for Yarn. I'm not sure this is a good idea.

While generated, the lockfile contains important information regarding the dependencies used by a project; maybe it would make sense to display them so that they can be properly reviewed? For example, it would be important for a reviewer to understand that a PR adds N packages to the lockfile (which might be much higher than the number of lines added to the package.json file).

This might also have security implication. A malicious author could easily change some resolved versions inside the yarn.lock file while also upgrading a package.json dependency to a innocent release. Since Github won't show the lockfile content by default the reviewer might forget to check it, accept the apparently harmless upgrade, and let the malicious override make its way to the lockfile.

Note that the same concerns might apply to the package-lock.json file from npm (except that they are much harder to review due to their nested design).

Its very unpleasant to review lockfiles, many devs glance over it without paying attention. I agree with you that marking it as auto generate would bring in more vulnerable packages.

Is there a way to validate the lockfile was derived from the package.json for example, make sure the PR doesn't downgrade a package from the previous already committed file given the version within package.json. Github should create such server side hook by default instead of devs placing that action in their build definition.

I think it could be solved in interesting ways by offering a dedicated diff view for lockfiles that would pretty-print the changes (similar to how images are directly displayed rather than being shown as binary diff). That would require development time on the Github side, though, so in the meantime removing the file from the generated list would be a good start.

Your reasoning makes sense to me, however it's important to keep in mind that this has been in place for nearly two years so will potentially startle devs when they start to see these appearing in their diffs. After recent furores around simple things like colours, I'm apprehensive making such a noticeable change without larger community buy-in and support first.

Is there a general Yarn community discussion or consensus on this change that we can reference when making such a change?

If there isn't, are you able to raise such a discussion on the appropriate community forum that can be used as qualification for this?

Is there a general Yarn community discussion or consensus on this change that we can reference when making such a change?

Nope, but this one is as good as any! I maintain Yarn on a daily basis so on our side I don't think it'd be an issue. I can also tweet this discussion from the Yarn twitter account if you wish?

Paging some other contributors if you want to shim in: @rally25rs / @Gudahtt / @bestander

Let's also ping the person who made the change we want to revert, in case they have something to add: @sunderls.

Trying to catch up on the discussion... It doesn't look like #3431 was ever merged, so I'm guessing this functionality isn't actually in-place?


My take on this is that the behvior yarn.lock should follow whatever other lockfiles do; npm's package-lock.json, Ruby bundler's Gemfile.lock, etc...

Personally, I want to know when my lockfile changes. Changes to these lock files represent actual functional changes to the final system (a version change to a dependency can have breaking changes).


The other thing to note here is that I assume the intent behind the "generated files" behaviors is that "this generated file change is the result of a different file changing" for example changing a coffeescript file will result in a corresponding "generated"/compiled js file change.

Though lockfiles are generated from package.json, a change to one does not always mean a change to the other. A yarn.lock file can change without package.json changing, which could indicate that someone changed the version of a dependency incorrectly.


To @mohamedmansour 's comment "Its very unpleasant to review lockfiles" that can be true, however to me it's more of a "am I expecting this file to have changes?" thing. If I don't expect that file to change but it shows up in my diff, then I know something is wrong. If the changes are so massive that its "unpleasant" to review, then you probably make some dependency updates and are expecting a big change to that file. That's the case where I don't actually inspect it line by line.

If you are talking about it from a package maintainer / PR reviewer perspective, then I do think it's even more important to show that change for the reasons @arcanis mentioned. If someone is manipulating your dependencies, they should have a good reason. Sure they can still hide a subtle version change amongst some other "noise" in the file changes, but that's no excuse for a maintainer to say " πŸ€·β€β™‚οΈ eh, too complex. Approve/merge. Whatever."

It doesn't look like #3431 was ever merged, so I'm guessing this functionality isn't actually in-place?

It was, in a different pull request, #3432.

Ping? πŸ™‚

@arcanis's reasoning is sound. NPM is plagued with enough security concerns as it is, I don't think we should be adding one more potential attack vector for malicious users.

In most cases, yarn.lock files will be listed at the end of a PR's file diffs, making them pretty easy to overlook. Since dependencies don't get changed/added by contributors too often, it's unlikely that yarn.lock files will cause too much noise in the long run.

+1 for removing yarn.lock from the list.
since large diff is already auto collapsed, its not gonna be a problem.

Another datapoint: when I review Renovate bot dependency update PRs on mobile, there is no way to view the contents of the automatically hidden yarn.lock changes without first switching to desktop view. This means the I have to use the frustrating workflow of:
open PR at files tab -> switch to desktop view -> click the expand link for yarn.lock and review the changes -> switch back to mobile view -> approve PR -> ...

Having yarn.lock changes be visible by default would save having to do this.

Would one of you like to open the pull request to revert #3431?

stale commented

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

stale commented

This issue has been automatically closed because it has not had activity in a long time. Please feel free to reopen it or create a new issue.

Would one of you like to open the pull request to revert #3431?

Done. See #4445.

@Alhadis hi! Thank you for opening #4445. However it seems to have only reverted the tests, rather than the logic that implements the behaviour?

I've opened a follow-up, PR #4459.

Damn, sorry about that. πŸ˜• I kind of assumed reverting a merge commit would undo everything else in a PR, but I guess not. Lesson learnt.

@Alhadis have you considered following suit for npm lockfiles (package-lock.json and npm-shrinkwrap.json), as suggested in the OP? It seems odd to me for Yarn's lockfiles to behave differently than npm's.

If the linguist maintainers prefer to treat package-lock.json files as auto-generated, is there a way for Github users to override that preference for individual repos?

@lildude Thanks for taking the time to reply. As a poor soul who just uses Git and isn't really familiar with Linguist, I'm not sure what the inverse of that looks like. Unless I'm missing it, that doc doesn't seem to explain how to inverse a statement.

Could I bother you for a concrete example of what I would need to add to my .gitattributes file in order to not treat package-lock.json files as auto-generated?

Something like... ?

!package-lock.json linguist-generated
package-lock.json !linguist-generated
package-lock.json NOT linguist-generated
package-lock.json not-linguist-generated```

@Ghazgkull See the third example for linguist-detectable: a dash negates your attribute; =false should also work:

package-lock.json -linguist-generated
package-lock.json linguist-generated=false

I have to strongly disagree with marking yarn lockfiles as non-generated because that's what they are: generated by a machine, not meant for human consumption. They are certainly polluting diffs, e.g. on medium sized project a dependency update it's still a 1-2k line change.

The fact that yarn lockfile is the only lockfile that is not considered generated is rather disturbing, but I guess it may be time to migrate off of it anyways πŸ˜‰.

not meant for human consumption

You're meant to review them - so insofar as you are a human, they are in part meant for human consumption.

In an ideal world, yes. But realistically no one is really going to review a 2k lockfile change in a big react project for example.

If the author is trusted, I just look at package.json and skip over the lockfile. If not, I pull the branch and regenerate the lockfile to see if it matches the provided one.

Yarn's changesets have very rarely that many changed lines - since we don't encode the recursive dependency tree via a nested structure, adding a package is often just 5-10 lines. Having more than that is actually a useful signal, since it shows that perhaps you're installing more packages than you expected.

That being said we also have more improvements in the pipeline to improve the general security even further, and having a reviewable lockfile is important to this goal.

Yeah, yarn diffs are usually not that bad. For example order of dependencies in the file seems stable (as opposed to npm which sometimes reorders for no apparent reason) and YAML also reduces line noise.

But still I think it should be collapsed by default in diffs because while I generally fly over lockfiles while reviewing, I prefer to have such files openable on-demand.