microsoft/rushstack

[rush] Rush build cache design is incompatible with Nextjs

elliot-nelson opened this issue · 2 comments

Summary

The design of Rush's build cache is incompatible with Nextjs in a couple key ways. It would be nice if this experience could be ironed out.

Details

The key thing about Rush's build caching design is that it operates on a folder level (each folder is either a build input, or a build output). This is different than Turborepo which defines build inputs (cache key) and outputs (cache value) explicitly as glob patterns.

As an example, the primary build output of a Nextjs app is the .next folder (this is not configurable). However, Nextjs build also produces a cache folder that speeds up local development, which lives inside the build folder at .next/cache. Again, this is not configurable. For Rush, there's no way to configure .next as a build output without including the cache folder, which means the build output size continues to increase with a bunch of unnecessary temporary artifacts.

In a reverse example, a plugin for Nextjs (next-pwa) produces a special file public/service-worker.js, which collapses the contents of your routes in .next folder into a special worker script. Again, this file location is hard-coded (if it was public/generated/service-worker.js, it'd be no issue, as we could make public/generated a build output).

Root causes

The root cause of these issues is the difference in implementation. In Vercel's config, it is trivial to specify ['.next/**', '!.next/cache', 'public/service-worker.js'] as the build output for your project (and similarly on the other side for your build inputs). Because it's trivial, nobody bothers to implement things like "customizing your cache folder location".

But for Rush, if you can't customize your outputs to fall into expected folder structures, the caching solution doesn't work.

Possible solutions

I'm raising this as a potential concern going forward; in the ideal world, I could configure Rush's build cache with globs, or perhaps we'd offer both options, with the "whole folder" approach considered slightly faster and more stable.

Sometimes, a "build wrapper" can fix the issue for us, but in this case I don't think this works -- the "Nextjs experience" still has to work for the local developer, and if running "next start" expects everything to be in the right folders, then you can't escape the fact that the build output has to live there.

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
@microsoft/rush globally installed version?
rushVersion from rush.json?
useWorkspaces from rush.json?
Operating system?
Would you consider contributing a PR?
Node.js version (node -v)?

At one point we were considering switching to globs, but opted to stick with folders for the following reasons:

  1. It is absolutely critical in a phased build environment that we be able to enforce that the build outputs from all phases are strictly disjoint. Calculating this for a collection of glob patterns sounds like a performance nightmare at best, assuming it is even feasible.
  2. When reading from the build cache, we currently use the native TAR binary to perform the unpacking. Admittedly, more recent experiments have shown that NodeJS using a Worker thread can actually outperform native TAR at unpacking a .tgz file, but one of the constraints of using the TAR binary is that for safety we have to unpack into an empty folder to ensure that the unpack operation doesn't walk through symbolic links into unexpected file locations.
  3. Also when reading from the build cache, Rush first needs to purge all existing build outputs. Being able to perform full recursive folder deletes offers the performance optimization of being able to temporarily move the folder and asynchronously delete its contents rather than having to do the full walk and unlink up front.

There are probably some other reasons that are relevant, but those are the main ones that come to mind.

I was trying to sell Rush to another dev, but he ran into this issue and said, "I won't be rushing to Rush", as his team is using Next too.

I'm not a user of Next, but wouldn't the standalone build mode solve this issue?

Scott