conda-forge/conda-forge-pinning-feedstock

migrations on arrow-cpp-feedstock run into timeout while rerendering

jdblischak opened this issue ยท 22 comments

Comment:

The pin for aws-crt-cpp was recently updated to 0.26.8 (#5762), which started the current migration aws_crt_cpp0268. It was successful for tiledb-feedstock (conda-forge/tiledb-feedstock#279), but unfortunately it failed with a rerendering error for arrow-cpp-feedstock. This is causing a problem because tiledb-py-feedstock requires tiledb and pyarrow, which currently generates a solver error (TileDB-Inc/conda-forge-nightly-controller#92 (comment)).

I've never attempted to troubleshoot a migration error for a feedstock that I don't maintain. As a first step, could someone with sufficient access try to restart the migration? Is there anything I can do to help?

The problem is that the rerendering takes longer than 900s. This is a newly added timeout in the bot. We need to make an exception for arrow-cpp in the bot (or wait for the next conda release).

The problem is that the rerendering takes longer than 900s. This is a newly added timeout in the bot.

Can I manually apply the migration locally and send a PR to arrow-cpp-feedstock? Or would that interfere with the migration?

It should solve your problem, but long-term the bot needs the timeout fix.

It should solve your problem, but long-term the bot needs the timeout fix.

Done in conda-forge/arrow-cpp-feedstock#1375

The problem is that the rerendering takes longer than 900s. This is a newly added timeout in the bot.

When did this get added? It's too tight for several feedstocks - IMO it needs to be bumped ASAP. CC @beckermr

Fortunately, a bunch of speedups landed in the upcoming conda-build 24.5, but we don't have that one available yet.

That timeout has been in place for several years. I did some refactoring but idk what would have changed to make a difference like this.

It will effectively break all migrations from working on the arrow-feedstock, which has a disproportionate impact because we have so many deps there that need migrating, as well as 4 maintenance branches. Basically, that feedstock is not workable without the bot opening these PRs, so we need to fix this real soon, as its going to cause logjams in several places otherwise. A timeout exception for arrow is probably the easiest. Or we could let the bot run on an unreleased conda-build (that we could build into a label). Or investigate what recent refactors regressed...

The timeout has been there several years and nothing was broken. I honestly don't get what's suddenly changed or regressed.

I just checked the pull requests for arrow-feedstock and I cannot find a PR from the bot for several years at least. Maybe I messed up checking?

Otherwise, this matches the fact that the bot has been enforcing a timeout for many years on long rerenders.

What is suddenly going to break for arrow-feedstock now when it never got prs anyways?

I just checked the pull requests for arrow-feedstock and I cannot find a PR from the bot for several years at least.

I don't know how you searched, but we've been getting 10+ PRs per month. Last one was April 23rd 02:55 UTC.

I honestly don't get what's suddenly changed or regressed.

Arrow has been getting long rerender times for a while (and once @mbargull got aware of this, he opened conda/conda-build#5224 and tackled some bottlenecks; the result of that is going to show up in conda-build 24.5). It's possible that we were close to the timeout already, and that some minor regression pushed us from, say, ~850sec over 900.

Those prs are for arrow-cpp, not arrow.

You mentioned arrow above, right?

It will effectively break all migrations from working on the arrow-feedstock, which has a disproportionate impact because we have so many deps there that need migrating, as well as 4 maintenance branches. Basically, that feedstock is not workable without the bot opening these PRs,

^ This is what is was responding too.

Yes, sorry, that was an inaccuracy on my part. There is a (IMO) much less important arrow library for timestamps, but I was referring to apache/arrow, the "composable data stack" behemoth that's eating the world (in a nice way), which for us is built on https://github.com/conda-forge/arrow-cpp-feedstock (various libs & python bindings)

Ahhhhh that makes more sense.

On the timeout itself, the bot enforces a timeout due to very long rerenders maing it slower than it already is. The 15 minute (=900 second) timeout is an order of magnitude or more longer than the typical feedstock rerendering time. I get that this is frustrating.

However, looking at the arrow-cpp recipe (which I have not spend much time on at all), it appears to be nearly a dozen packages in one. Splitting that feedstock up into more than one would potentially have other benefits in addition to easing the burden on our already overstressed tools.

The timeout has been there several years and nothing was broken. I honestly don't get what's suddenly changed or regressed.

To share my anecdotal experience, I just manually applied this migration to the 14.x, 13.x, and 12.x branches. Rerendering the 12.x and 13.x branches was relatively quick (they finished before 14.x even though I started them afterwards). 14.x took a lot longer, and obviously main took the longest (15+ min). So I don't think the recent large increase in rerendering time is solely due to changes to conda-smithy. I assume that something about the structure of the feedstock changed between 13.x and 14.x, and has only worsened the rerendering speed over time. Note that I am using the latest conda-smithy release (3.34.1).

I didn't notice that you had marked this issue as closed in conda-forge/arrow-cpp-feedstock#1375, I'm going to reopen it, as the underlying problem hasn't been fixed.

Splitting that feedstock up into more than one would potentially have other benefits in addition to easing the burden on our already overstressed tools.

I opened an RFC for that: conda-forge/arrow-cpp-feedstock#1381

Migrations have been coming again for arrow-cpp, so closing this.

Really? I haven't seen a bot PR yet. All the recent migrations were done by hand.

Ah! I only have seen migrations and thought you wanted to see them happen automatically. I'll look into this next week (when I fix the aws-sdk-cpp release)

It's possible that the performance improvements in conda-build 24.5 unblocked this. There are new PRs for main (much reduced after splitting of pyarrow bits) and v15.x / v14.x (still big to rerender) and below:

Closing as we reliably get PRs for the feedstock again.