yarnpkg/yarn

🦁 Yarn 2.0 Working Group

arcanis opened this issue Β· 45 comments

Alright @yarnpkg/core (and community!). Let's discuss what we want for the 2.0 release πŸ™‚

Ideas "en vrac":

  • Drop support for Node 4 (#5736)

  • Revamp the command line implementation (switch from Commander to Yargs, remove the complicated custom logic as much as possible)

  • Harmonize the command line (an example in mind is the --pattern option, which sometimes is an option like for yarn list, sometimes a -P,--pattern option like for yarn upgrade, sometimes a --scope option that only takes the scope like for yarn upgrade-interactive, and sometimes an argument that doesn't glob like for yarn outdated)

  • Change the lockfile syntax to a YAML-like file format (YAML-like because we would just slightly tweak our parser to emit something YAML-compatible, but we wouldn't support any advanced YAML feature)

  • Remove the code that deals with multiple registries (yarn / npm, cf src/registries). The yarn registry being a mere mirror of the npm registry, this doesn't make sense.

Anything else you think would be important to ship during this version? Especially in term of breaking changes requiring a semver-major bump?

About the YAML lockfile - I think it's great. Maybe we can do some revamping of the format while we're there? eg. lose shasum in favor of integrity?
Actually, maybe we can implement this: https://github.com/yarnpkg/rfcs/blob/master/accepted/0000-registry-url-in-lock-file.md
What does everyone think?

As you can imagine, I like the CLI stuff, that is ripe for a large cleanup for sure. The lockfile change I'm less excited about because I don't really see the clear value that it would bring. Can you explain that a bit more?

What other large features should we consider for a Yarn 2.0 release that we haven't thought of yet?

Can you explain that a bit more?

It's a bit clunky in term of interop to have our own file format. Sure we have the @yarnpkg/lockfile package to parse/write it, but I feel like it would be better if we didn't have to reinvent the wheel when there's fine solutions outside.

Iirc the initial reasoning behind the custom lockfile format was that JSON isn't convenient to read for humans, and that YAML parsers are too big. I kinda agree with both but:

  • If size is a concern, then JSON.parse is native and would remove the parser from the equation (750 lines ... we would still keep the conflict resolver so let's say 700). It depends on much it's a concern vs readability.

  • Not using a YAML parser doesn't mean we can't have a YAML-compatible syntax. We're almost compatible, except for a few quotes and colons here and there. It makes me sad to be so close but so far πŸ™‚

The only reason for me not to change how things currently work is that the lockfile file format is a core component, so we would have to make sure that the 2.0 would be able to convert the old format into the new format, whether it's JSON or YAML. Then we could remove this logic starting from 3.0 or 4.0. It's a bit of work, so maybe it's not worth it, but I still think it's worth being considered πŸ™‚

Actually, maybe we can implement this: https://github.com/yarnpkg/rfcs/blob/master/accepted/0000-registry-url-in-lock-file.md

Yup, that sounds like a good idea (I have a few comments on the proposal itself, but I'll post it to yarnpkg/rfcs#84).

I'd also like to remove the part of the code that deals with multiple registries (not talking about registry urls - we have branches for the "yarn registry" and for the "npm registry", which doesn't make sense since the yarn registry is effectively the npm registry).

Looking at the code, I think it was originally used to have a different configuration between npm and yarn, but I haven't seen it used anywhere, even after a quick search on Github, and it complexifies a lot the codebase (especially since it isn't actually implemented everywhere and isn't well tested, so I'm not even sure it works ...).

I feel like there isn't a good enough justification to change the lockfile format: it would churn all existing lockfiles because we believe aligning it with YAML is a good thing. In hindsight we should have probably gone with YAML but now the cost of changing doesn't seem worth it if the upside is only hypothetical.

The lockfile change I'm less excited about because I don't really see the clear value that it would bring.

I believe the complaint from users is that @yarnpkg/lockfile is nice to have, but only if you are implementing a JS utility. I remember someone saying that they wanted to write something in Python that does some processing of the lockfile.


This might be a good opportunity to deprecate link: dependency type. It's a bit flakey to actually use (if the package you link to has a node_modules dir then node will resolve in there, instead of in the parent package that is including the link) and I think the intent behind it has been supplanted by workspaces. Alternatively, we instruct people to use yarn link instead of link: dependencies.


As we did with yarn v1, I think we should include all the high-priority tagged issues and try to resolve them (there are 35 currently).

Also, we could use a GitHub "Project" to track v2 tasks, like we did for v1. I thought that worked well.

Oh I've got one to add to the list: automated build & release, and defining a release cadence. Right now it feels sort of random of when we release a new version, and it leads to a lot of "when will this PR be released" questions.

It's be nice to do something like "on the first of every month we will release a bug fix increment"

BYK commented

There's also this that I really want to address: #4147 and #4379.

I'll share my proposal on this soon on that ticket.

The lockfile change I'm less excited about because I don't really see the clear value that it would bring.

The custom format of the lockfile is a constant source of confusion for people. It is not well-documented, is not supported by syntax highlighters (they treat it as YAML), and we need to maintain our own parser package which we don't really do a good job of.

If the lockfile was a limited subset of YAML it may resolve most of the issues I listed above. @imsnif's suggestion is also about the lock file but more about its contents rather than the format.

What other large features should we consider for a Yarn 2.0 release that we haven't thought of yet?

I think 2.0 is about breaking changes, not necessarily large changes.

Has yarn stabilized to the point where it could expose an API (#2740)? It's not a breaking change per se, more the opposite.

Definitely +1 for cleaning up command line arguments as they're really confusing. As a user I understand @cpojer 's concern about having to basically recommit your entire lockfile with the upgrade but I think dropping the customer parser and exposing it as YAML api would be better for future's sake.

Few other thoughts I had in mind:

  1. Come up with a documentation-driven model for development in Yarn -- it's hard to sync between code changes, cli options, and docs which live on a separate repo. Maybe start with every PR needs an accompanying docs change unless it's something like a chore?
  2. A new command yarn run-parallel that allows you to run multiple commands together like yarn run-parallel lint test. Also possibly implement the --pattern flag to run commands, e.g. yarn run-parallel --pattern build?
  3. Yarn check has not been kept to up-to-date with the new features of yarn and there seems to be confusion/lack of documentation about --verify-tree, --integrity, etc. #2287
raido commented

Lockfile format should not be a breaking change in v2.0. New format can be introduced but current one should be also supported. Otherwise it would cause churn in large teams as all engineers possibly will not upgrade Yarn in their machines at the same time.

Clear migration path should be provided.

Or maybe opt-in new format should be implemented in current v1.x releases with deprecation message for current one.

Agreeing with the sentiment that the v1 lockfiles should still be usable with v2, but there's another option that I'd also be very content with.

A separate package could be created to migrate lockfiles from v1 to v2 (and possibly later from v2 to v3).

This would allow Yarn v2 to remove all logic for the old v1 lockfiles while still making it very easy for developers to update to v2.

Just throwing a ball.

Don't worry: whatever happens, v1 lockfile would still be readable in v2 πŸ™‚ We simply would convert them on the fly to the new format the next time you run yarn install (except when lockfile modifications are locked by one of the command line options).

raido commented

What if person A installs v2, updates the lockfile to new format. Then person B checks out the repo and runs β€œyarn install” with v1 on v2 lockfile?

I don't think we can do much about this. It's the same thing as if someone starts using workspaces while their colleages are still using an old Yarn release, or npm, or a different package manager, etc. That's a reason why we recommend checking in the Yarn version in the repository, and using the yarn-path configuration settings to force this version to be used.

I don't think this'd really be a user-facing thing, but it'd be nice if the repo was reformatted a bit. Like, make @yarnpkg/lockfile it's own subproject in the repo that yarn requires instead of another bundle that it spits out. It might also pave the way for exposing more of the API.

raido commented

Checking in Yarn in private projects might work. But what about people who contribute to multiple open source projects which use v1 and v2 both? Do they need to then switch manually their Yarn version or create some β€œnvm” like solution?

Obviously this might not be a issue at all just expressing my thougths.

It would be solvable if we release a minor which could read, but wouldn’t write the new format. Everyone on 1.x who didn’t upgrade to 2 yet then will be able to read both formats

But what about people who contribute to multiple open source projects which use v1 and v2 both? Do they need to then switch manually their Yarn version or create some β€œnvm” like solution?

They would just check in the version of Yarn they use for each project inside all of those repositories, and put a different yarnrc file into each project, that would point to the location of their copy of Yarn (still using yarn-path). Whatever global Yarn they have would then delegate its calls to the local one.

It would be solvable if we release a minor which could read, but wouldn’t write the new format.

That's would slightly extend the window of forward compatibility, but people using Yarn 1.6 still wouldn't be able to read those files, for example. We unfortunately don't have stats regarding which versions of Yarn are in use, so I don't know how frequently people update πŸ€”

Regarding scenario lockfile V1 -> (switch to Yarn 2) -> V2 -> (switch to Yarn1) V1.

  • We could backport a migration V2 -> V1 into Yarn V1.
  • Yarn V1 should check version if Lockfile, so we should avoid unexpected behavior
  • And what would be the minimal change to become a subset for Yaml, can V2 be forwards compatible?

Regarding API - I always want it.
We can have many implementations and plugins to Linking phase, for example the feature that creates hardlinks in node_modules for duplicated modules could totally be a plugin.

BYK commented

No need to worry about v1 not being able to read the new lockfile format. This is exactly why the change is being done in a major release. It is also not safe to assume Yarn will produce the same results across major versions. This is also part of our commitment to semver.

No need to worry about v1 not being able to read the new lockfile format. This is exactly why the change is being done in a major release. It is also not safe to assume Yarn will produce the same results across major versions. This is also part of our commitment to semver.

Adoption V1->V2 will take time and people will get confused no matter what we do.
Fail-fast approach (like "Yo dawg, update your Yarn, and now I crash") is easier but I think we could be more graceful.
Personally I think semver is a lie :)

I would love a direct integration with NSP (Node Security Project), ideally. Security is a huge concern but thus far NPM and Yarn have implemented lockfiles and package verification to handle man in the middle attacks. But vulnerable/compromised packages still require manual setup for every project.

We unfortunately don't have stats regarding which versions of Yarn are in use, so I don't know how frequently people update πŸ€”

@arcanis Perhaps we could get User-Agent stats from CloudFlare, if they can provide raw access logs. If I remember correctly, we include the Yarn version number in the user-agent. That won't tell us all Yarn versions in use (as many companies would be using it with a local mirror and thus never actually hit the public registry) but it'd at least give us some rough numbers.

Maybe we can also address #3630 - if we are going to change the lockfile format, we could consider also adding the hoist location so that we can stop resolving devDependencies during --production installs.

As this will be a new major version, was is it ever discussed to maybe adopt parts of pnpm's behavior? I'm just curious. Thank you for your work, everyone.

How about using symlinks inside node_modules instead of copying all the dependencies to node_modules? This can save a lot of disk space and prolong the life of SSD.

Not quite yet, no. Symlinks (and hardlinks) have some portability issues and do not behave better in every case, so right now it's not a priority. That being said I plan to review our linking process soonish (probably post-2.0), and it might be revisited then.

Any movement on minor changes like this πŸ™ƒ #5625
Also, any effort to improve speeds around offline cache would be really nice.
Second a clear migration path via a codemod or script/guide that is easy to follow if necessary.

Hi, this will probably not be a popular request here but as a Yarn user, I'd appreciate closer "UX compatibility" with npm. For example, some command line switches are different between Yarn and npm, running scripts can produce different results (and 1.6.0 has been seriously broken in Git Bash), etc.

The goal here is that as long as I don't want to use Yarn-specific features like workspaces (which are awesome BTW), there is no mental overhead for me switching between the two.

Not sure how actionable this is but I thought that this is a good issue to mention it.

Also, I'm sometimes wondering what you guys think about the npm client and its recent advancements (last year in review and their plans for the future).

When Yarn came out, it was sorely needed and just the speed improvements alone were game-changing. Plus, npm's locking model was either absent or seriously broken for a long time. But at this point, the two projects really seem and feel almost the same, apart from a couple of differences like workspaces or npx.

I don't have any insight into whether the teams like / dislike each other, whether Yarn is of some strategic importance to Facebook or whether server-side registries play any major role but it almost feels like the two projects could start thinking how to unite, perhaps?

Note that I think this discussion should be in its own thread rather than on the WG for the v2 πŸ˜‰

Since we're multiple maintainers from multiple horizons I can only speak from my own personal perspective, which is that I'm happy for them to see their tool being actively maintained again. Still, I'm of the opinion that each project has its own strengths, goals, and policies, and there would be little point into merging them.

An interesting data point is that Yarn now accounts for about 40% of all the requests to the npm registry, and is constantly growing (~20% since the v1 iirc?). I think it bears witness that there's a need for alternative projects - whether it's Yarn or other interesting projects like pnpm.

Overall, I personally don't care much about competition, which is why you'll rarely see me compare Yarn and npm on Twitter - let's all try to make the best projects we can, and everything will fall into place!

I don't have any insight into whether the teams like / dislike each other

As far as I can tell we don't dislike each other, the npm folks have even collaborated with us on a few PRs, and some maintainers here also contributed back to npm πŸ™‚

How about using symlinks inside node_modules instead of copying all the dependencies to node_modules?

Copy-on-Write is probably better than symlinks or hardlinks, but requires a file system that supports it (such as btrfs). Copy-on-write reuses the same bytes on the disk when the files are copied (like a hardlink), but a copy is created if you write to the file (so directly editing a file in one project's node_modules won't affect the global cache).

Symlinks might not be easily doable since they tend to require userland support (and not all tools support them), but hardlinks or CoW should work as they're transparent to the app.

We've been talking over at #5654 that projects having both a yarn.lock file and a package-lock.json file are probably doing something wrong. In this context, it would be interesting to throw an error when it happens, at least on CI.

I remember @BYK is a fierce advocate of frozen-lockfile being pushed a bit more than it currently is. Maybe those two rules would be a good basis for an official CI mode?

  • Would be enabled by default if the CI environment variable is set (maybe others)
  • Could be enabled manually with --ci (should we make it disableable?)
  • Would print an info message linking to the documentation and explaining the CI mode
  • Would set --frozen-lockfile
  • Would upgrade warnings into errors (all of them? some of them?)

What do you think? This would give Yarn a standardized way to enforce correctness where it matters the most.

BYK commented

I like your proposal @arcanis. I think the major concern around having a CI mode was having two different operating modes without a clear signal. If people are okay with that and if we can make our logging system a bit more structured with proper levels to silence this on internal systems like ours to reduce noise, I'm game for this change.

That said I still think change the package.json file and running yarn install without a clear intention to update the lock file is a bit unsafe and potentially confusing.

nevir commented

Would like to lobby for #3330 for 2.0, too

@nevir Did you knew something we didn't? πŸ˜†

Btw @yarnpkg/core, I opened a github project to reference what we want to do: https://github.com/yarnpkg/yarn/projects/4
I think a few things discussed here are still missing in the project, I'll add them later.

Feel free to assign yourself any task you'd like to work on (this also applies to non-core contributors! This is a great opportunity to do impactful things for Yarn, so please ping me and I'll do my best to help you get started!). I think I'll release a 1.7.1 next tuesday, then we'll be able to freeze for a month or two the time to implement all this.

not sure if it is too late, or is this idea better suited for 2.0 or a separate RFC... figure I will throw it out here first:

on the high level: hoisting has caused a lot of confusion and tool compatibility issues, for all package managers actually, we can provide a new hoist scheme (let's call it transparent hoisting for now), which achieve redundancy reduction transparently (via OS feature such as hardlink) instead of moving/consolidating modules around like the current hoisting scheme. We will get even better optimization (for example, right now only 1 version of the module can be hoisted, but with hardlink, all versions can be "hoisted") with much more intuitive module graph and better 3rd-party tool compatibility.

Sure hardlink might not be as portable as we would like today, but we don't always have to go to the lowest denominator... we could offer this feature, in parallel with the current hoisting scheme, then gradually improve its coverage/compatibility without holding back the majority of the community.

We already have the hardlink capability, but it still works under the hoist scheme. The current hoisting logic can be greatly simplifed as most of the complexitity is not needed. This feature might sound radical, but I think we already have most pieces available, just need a new way to assemble them...

I think this should be refactored. Can use async generator function + for await (const step of this.installSteps()) { } (for ex) - or, an async function + generator function that yields these promises;

What you guys think? @rally25rs @arcanis

v2 plans are detailed here: #6953, closing this issue πŸ™‚