`cargo doc` in large workspace is 10 times slower on nightly

Question

`cargo doc` in large workspace is 10 times slower on nightly

Opened this issue 2 months ago · 28 comments

madsmtm commented 2 months ago

Reproduction Steps

Checkout https://github.com/madsmtm/objc2/tree/815219f441c2f6a67a6787429e4460a43a66c7ff (don't forget to load git submodule generated).
Run cargo +nightly-2025-08-19 doc --workspace --target aarch64-apple-darwin.
Notice that it finishes in 1-2 minutes.
Run cargo +nightly-2025-08-20 doc --workspace --target aarch64-apple-darwin
You don't wanna wait for it to finish.

Logs from my CI on ubuntu-latest:

On nightly-2025-04-11 (cargo doc takes ~2 mins)
On nightly-2025-08-24 (cargo doc takes ~24 mins)

Bisection

searched nightlies: from nightly-2025-08-19 to nightly-2025-08-20
regressed nightly: nightly-2025-08-20
searched commit range: 9eb4a26...05f5a58
regressed commit: 8365fcb

bisected with cargo-bisect-rustc v0.6.8

Host triple: aarch64-apple-darwin
Reproduce with:

cargo bisect-rustc --timeout 120 --start 2025-08-19 --end 2025-08-20 -- doc

Additional Details

I'd guess this is probably from #144476, CC @notriddle @lolbinarycat @GuillaumeGomez. I see that the regression was spotted by the perf test suite, but probably not to the degree that I'm hitting.

I tried creating a more minimal reproducer for this, but that was kinda hard, since it's a performance thing, so it only really shows up in larger cases :/.

I did try looking at Activity Monitor, it looks like often there's only a single rustdoc process consuming CPU, and only intermittently, so there might be some global locking going on? fseventsd also seemed very active, but that might be normal, idk.

Answer 1 · 2025-10-07T09:54:43.000Z

@madsmtm this issue seems to be at least mitigated by #144476

Can you check how does it look like for you now? Thanks.

Answer 2 · 2025-10-07T09:56:36.000Z

mitigated by #144476

#144476 is the PR that introduced the compiletime regression. Did you mean a different PR?

Answer 3 · 2025-10-07T10:03:40.000Z

nevermind, sorry. I meant to link https://togithub.com/rust-lang/rust/issues/146048#issuecomment-3267080938 but I don't think it's relevant

Answer 4 · 2025-10-18T04:53:30.000Z

I am experiencing this slowdown on my project, building on aaarch64-apple-darwin — stable takes 30 seconds and nightly (rustdoc 1.92.0-nightly (53a741fc4 2025-10-16)) takes 5 minutes. I also observe that only 1 rustdoc process is consuming CPU at a time. Here are a couple of sample -w rustdoc profiles of the running rustdoc processes: rustdoc-sample-1.txt, rustdoc-sample-2.txt.

Notably, one of them is indeed executing search index code (1235 _RNvNtNtNtCsgFlDbPMoJaE_7rustdoc4html6render12write_shared12write_shared) and the other one is waiting on a file lock (8420 _RNvMNtNtCsdf1zDdRBxVI_21rustc_data_structures5flock4unixNtB2_4Lock3new). However, other samples have shown other cases such as being inside trait solving, so these 2 should not be taken as exhaustive (it's difficult to catch everything).

Also, the pace of progress and the timings chart gives me the impression that the behavior is quadratic, slowing down for later crates — is the search index code perhaps re-indexing the items from earlier docs? If so, a tricky but effective improvement would be for Cargo to ask rustdoc to index only as a final step (perhaps only crates that have no dependents in the current build graph).

Answer 5 · 2025-10-25T16:42:11.000Z

If so, a tricky but effective improvement would be for Cargo to ask rustdoc to index only as a final step (perhaps only crates that have no dependents in the current build graph).

That would require RFC3662, right?

Answer 6 · 2025-10-25T17:34:08.000Z

If so, a tricky but effective improvement would be for Cargo to ask rustdoc to index only as a final step (perhaps only crates that have no dependents in the current build graph).

That would require RFC3662, right?

Yes, it would. (It had not occurred to me that the search index might not be separate from the data that needs to be compiled into it, but if so, indeed that needs to be changed.)

Answer 7 · 2025-10-26T23:19:51.000Z

I'm experiencing this too, in several different projects. I'm finding it a very severe regression.

Answer 8 · 2025-10-27T10:04:45.000Z

Is it not possible to revert the original problematic commit before this hits stable? Lots of people are going to be affected.

The bug seems like it has the appropriate labels, notably, P-critical and regression-from-stable-to-beta. Is there a process that would prevent a release in this situation? IMO there should be. We shouldn't be updating stable with a new version of Rust that has a critical bug...

Answer 9 · 2025-10-27T10:09:07.000Z

There is no bug, just a performance impact. Performance are treated differently. We considered the trade-off (ie, much faster documentation search) to be acceptable. We already merged multiple performance improvements, more are currently being reviewed and there are still a lot more potential improvements we didn't try out yet.

Answer 10 · 2025-10-27T10:17:02.000Z

I concur with @GuillaumeGomez, while cargo +1.91 doc will be basically unusable for me, I don't think the PR should be reverted, and I don't think this should block the release, affected users can do cargo +1.90 doc in the meantime.

Answer 11 · 2025-10-27T10:18:38.000Z

Although it could be worthwhile to backport some performance improvements already merged to stable.

Answer 12 · 2025-10-27T10:38:30.000Z

"Just a performance impact" is a serious understatement. A sufficiently severe performance impact can make the program unuseable. That is definitely the case here, at least some of the time.

Measurement on incremental builds (and therefore developer workflow)

Time for an incremental rebuild of docs in the Arti repository, after editing a lower level crate.

On stable:

real    0m33.135s
user    1m19.553s
sys     0m13.213s

On beta:

real    2m48.452s
user    3m22.964s
sys     0m36.719s

real    3m0.717s
user    3m31.454s
sys     0m37.915s

real    3m9.774s
user    3m39.976s
sys     0m38.090s

Numbers vary surprisingly much. But, this is a >5x slowdown. (Steps are below the cut.)

Impact

Since this regression landed in beta, I have been using stable to do all my local docs builds. I will have to stick with 1.90.0 I guess.

In CI, where full builds are more common, I expect the problem to be much more severe. I have no idea what the effect will be for the Arti CI, but I imagine at least some projects are going to find their CI goes red due to timeouts when this lands.

The Rust stability guarantee

I hate to bring this up, but since someone has said this:

while cargo +1.91 doc will be basically unusable for me, I don't think the PR should be reverted, and I don't think this should block the release

At least one of the following must be true:

cargo +1.91 doc will be useable for the vast majority of people
this part of the Rust Project does not intend to honour the Rust stability guarantee

There are a number of ways to achieve (1). But the starting point is that it must be treated as a release blocker. Yes, the stability guarantee is sometimes inconvenient, and it can reduce the rate of progress. But it's a foundational principle which is doing its work precisely when it's inconvenient.

Next procedural steps

Please would someone on the team let me know the proper escalation path for a review of what appears to be a decision to allow this regression to ship.

Stable:

rustc 1.90.0 (1159e78c4 2025-09-14)
binary: rustc
commit-hash: 1159e78c4747b02ef996e55082b704c09b970588
commit-date: 2025-09-14
host: x86_64-unknown-linux-gnu
release: 1.90.0
LLVM version: 20.1.8

Beta:

rustc 1.91.0-beta.10 (f2f881bb9 2025-10-25)
binary: rustc
commit-hash: f2f881bb99cf03bca0c54b4a0b1209b40b8cc383
commit-date: 2025-10-25
host: x86_64-unknown-linux-gnu
release: 1.91.0-beta.10
LLVM version: 21.1.2

Test rune:

echo >> crates/tor-basic-utils/src/lib.rs ; time cargo doc --locked --workspace --document-private-items --all-features

Source code:

https://gitlab.torproject.org/tpo/core/arti de0e99458848f81e74de33bae68b24845c6a841a

My system:

Framework 13 AMD. Ryzen 7840U, 8 cores, 16 threads. 64Gby RAM. Debian trixie.

Answer 13 · 2025-10-27T10:43:57.000Z

But the starting point is that it must be treated as a release blocker.

I fully agree with @ijackson here.
To me, documentation is at least equally as important as the code itself, as the latter one often falls automatically in place when a system is well-documented.
What I fear with this making it into a stable release would be, that it might lead to an overall decline in documentation in the overall Rust ecosystem, as it no longer becomes convenient to run a cargo doc for checking everything is alright and many people generally seeing documentation not as important as I do, thereby potentially neglecting it even further, if the tooling becomes more and more expensive.

Answer 14 · 2025-10-27T10:50:16.000Z

Can you give a try to nightly to see the difference with stable? That would hope for what I wrote above:

Although it could be worthwhile to backport some performance improvements already merged to stable.

Answer 15 · 2025-10-27T11:09:21.000Z

I tried nightly-2025-08-19 vs. nightly-2025-10-27 in my workspace on a Macbook M2 Pro, got:

$ cargo clean
$ cargo +nightly-2025-08-19 doc --workspace --target aarch64-apple-darwin
    ...
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 51.16s
$ cargo clean
$ cargo +nightly-2025-10-27 doc --workspace --target aarch64-apple-darwin
    ...
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 33m 13s

So I sadly don't think the current improvements are enough.

EDIT: Use 2025-08-19 as baseline instead of 2025-04-11.

Answer 16 · 2025-10-27T11:46:11.000Z

I have never felt any need for faster rustdoc search.

I have often felt the need for using cargo doc locally, which this will make, for any reasonable usecase, impossible.

Please do not stabilize with this big of a performance regression. Sometimes performance regressions are so bad they should not be treated like performance regressions, but like actual blocking bugs, no matter what the established processes say.

No feature, especially not something as unimportant as docs search performance, is worth half-hour long docs builds.

Answer 17 · 2025-10-27T12:00:42.000Z

Can you give a try to nightly to see the difference with stable? That would hope for what I wrote above:

Incremental docs build in arti.git (which was previously 32s +/- 1s) on nightly:

real    2m40.776s
user    3m22.290s
sys     0m31.919s

real    2m49.206s
user    3m28.374s
sys     0m32.438s

real    3m9.174s
user    3m45.404s
sys     0m35.252s

Within the margin of error compared to beta. The restoration of performance, if there is one, is less than the (also much increased) variability, at least with this relatively small sample.. Effectively the same unconscionable regression from stable.

I think this can only be made to perform reasonably by not redoing the index on every crate. That isn't going to happen in 1.91. Therefore a revert is the appropriate response for 1.91.

Same setup as in #146895 (comment)

rustc 1.93.0-nightly (f37aa9955 2025-10-26)
binary: rustc
commit-hash: f37aa9955f03bb1bc6fe08670cb1ecae534b5815
commit-date: 2025-10-26
host: x86_64-unknown-linux-gnu
release: 1.93.0-nightly
LLVM version: 21.1.3

Answer 18 · 2025-10-27T12:11:21.000Z

Incremental docs build in arti.git (which was previously 32s +/- 1s) on nightly:

Could you add the specs of the machine which built this, for reference?

Answer 19 · 2025-10-27T12:13:03.000Z

Please would someone on the team let me know the proper escalation path for a review of what appears to be a decision to allow this regression to ship.

Please do not stabilize with this big of a performance regression. Sometimes performance regressions are so bad they should not be treated like performance regressions, but like actual blocking bugs, no matter what the established processes say.

I can't speak with any sort of authority on the subject here, I only relatively recently joined the compiler team. The following is my understanding of the situation, please take it with a (large) grain of salt:

There are several different teams in charge of various parts of Rust. The Rustdoc team is the relevant team in this case, they're responsible for rustdoc, including regressions in it like this. There is no "escalation path" higher than this team as far as I'm aware.

I know we've said "block a release", but that isn't actually a thing that the Rust project does. Instead, the release team "backport" approved pull requests to the beta branch. For comparison, I know that the compiler team discuss backport nominations in this Zulip thread, and later decide upon them in weekly meetings, and I know that we're usually pretty conservative. I don't know how the rustdoc team handles backport nominations and approvals, but I imagine it's something similar.

So, for a revert to happen, someone has to create and nominate a PR for beta backporting, and it has to be accepted by the rustdoc team. The problem here is that the regression is a large PR, with a bunch of later PRs on top improving various other things, so we'd effectively have to revert several weeks of work. Doing that correctly without introducing other regressions is... Hard, to say the least. I see that the rustdoc team recently discussed this issue and this possibility on Zulip as well, not sure if there was a clear conclusion.

So in theory, a revert would be possible, realistically, I doubt it's gonna happen with 3 days left until the release.

Maybe if we'd had this conversation earlier it would've been more feasible? I actually noticed the issue a few weeks prior to reporting it, but was too lazy to report it then, so I guess I'm partially at fault for not reporting it earlier (which is not to say that I assign blame to anyone else, if anything's to blame it's our process).

Does this summary make sense? If not, I'd love to clarify.

Answer 20 · 2025-10-27T12:14:41.000Z

Could you add the specs of the machine which built this, for reference?

It's a very fast laptop. Please see below the cut in #146895 (comment)

Answer 21 · 2025-10-27T12:20:30.000Z

Maybe if we'd had this conversation earlier

I see that the P-critical label was added 3 weeks ago. I noticed this myself on approximately that timescale (I run with beta usually, precisely to try to help detect problems early), but of course with a P-critical ticket already open, one would expect the bug not to ship.

Answer 22 · 2025-10-27T12:27:14.000Z

The problem here is that the regression is a large PR, with a bunch of later PRs on top improving various other things

One approach would be:

Extract a copy of the pre-rewrite code from the git history
Put it into a module alongside the new code
Put the new code behind an unstable flag
Arrange to test both versions

This is perhaps what ought to have been done to start with. The compiler team frequently take this approach when something important is rewritten.

If this can't be done at this late stage of 1.91, it could be done in nightly and beta and maybe backported to a 1.91.1.

I'm afraid I'm not really qualified to do that work.

Answer 23 · 2025-10-27T12:36:58.000Z

The implementation is scattered over 25+ files and has 8327 insertions, 4300 deletions (both excluding tests) and I would expect several of them to be modified for unrelated reasons, so I would assume neither reverting nor creating a separate copy with the old code (which is basically revert + extra work) is feasible without the risk of more severe regressions (as in doc generation being completely broken rather than only being slow). https://github.com/rust-lang/rust/pull/144476/files

Answer 24 · 2025-10-27T12:42:59.000Z

I tried nightly-2025-08-19 vs. nightly-2025-10-27 in my workspace on a Macbook M2 Pro, got:

$ cargo clean
$ cargo +nightly-2025-08-19 doc --workspace --target aarch64-apple-darwin
...
Finished dev profile [unoptimized + debuginfo] target(s) in 51.16s
$ cargo clean
$ cargo +nightly-2025-10-27 doc --workspace --target aarch64-apple-darwin
...
Finished dev profile [unoptimized + debuginfo] target(s) in 33m 13s

So I sadly don't think the current improvements are enough.

EDIT: Use 2025-08-19 as baseline instead of 2025-04-11.

Well, nightly is ~40% faster than current beta so I think it would make this performance regression bearable. List of PRs to be backported:

Answer 25 · 2025-10-27T12:49:10.000Z

Well, nightly is ~40% faster than current beta so I think it would make this performance regression bearable.

I'm afraid I think this is completely wrong. When assessing the regression, we should be comparing with stable. The thing you are quoting shows a factor of 35 regression even with this 40% improvement.

Factors of 0.6 are nothing when we're sometimes multiplying them by 50.

Answer 26 · 2025-10-27T12:51:05.000Z

And again I'm saying that we're not planning to revert this change for a lot of reasons. I'm sorry for the performance regression, this is being worked on.

Answer 27 · 2025-10-27T12:57:37.000Z

And again I'm saying that we're not planning to revert this change for a lot of reasons. I'm sorry for the performance regression, this is being worked on.

Thanks for the sympathy, but it would be more reassuring if you didn't try to minimise the scale of the problem.

Is anyone working on doing something to make it

not use an approach with quadratic complexity in the number of crates
not perform per-crate actions while holding a global lock

? Almost any kind of bodge to fix those two problems would remove the regression and probably replace it with a perf improvement (since I think it was quadratic before).

Without that, refinements to algorithms aren't going to make it useable again.

Answer 28 · 2025-10-27T13:11:58.000Z

a factor of 35 regression

I will say that this is in a unreasonably huge workspace with almost 200 crates and thousands of generated files. I don't think it's really fair to compare that to any other real-world uses, I suspect a 40% speed improvement is worthwhile and noticeable for "normally" large workspaces.