google/heir

ci: LLVM Upstream Bazel BUILD issue

Closed this issue · 6 comments

llvm/llvm-project@0309709

Waiting on this commit to be merged.

Given the issue @AlexanderViand-Intel posted recently as well, can we actually implement a few days delay? and/or put a manual job to update LLVM weekly?

I don't think we can do that with the way everything is set up. Internal CI should have prevented this from going through anyway. Let's check in on the internal build and figure out how it got merged in a broken state.

There was another issue with the CMake being updated but the bazel not being updated correctly.

On the latest HEIR integrate update commit (b08c7ae78c98bad13b0d71e24591a3d5205cc33a)'s internal CL our copybara job was cancelled (and logs are blocked for me).

But externally, the commit I linked above hasn't been merged yet.

Let's check in on the internal build and figure out how it got merged in a broken state.

The internal build is fine, the BUILD rule was updated internally. But externally, there's a different Bazel BUILD file that doesn't have the fix until the commit I linked above, and that's not merged yet.

Internal CI should have prevented this from going through anyway.

How does internal CI prevent this from going through if internal CI uses the blaze BUILD rules, and our github presubmits don't block submission because we're not on SLA?

We are on SLA. No internal integrates are possible unless (a) our TAP passes or (b) someone force submits the integrate change or (c) one of our skipped tests is the broken part (e.g. some of the end to end rust tests may fall under this).

I can probably pop in in a few hours to take a closer look. Currently with my kid at the dentist 🦷🪥

OK I understand the issue now.

I read through what happens for BUILD files, I'll link some internal docs (which you've read since you've been on integrate rotation). That LLVM integrate CL that we merged in had a locally patched BUILD file, and like Tensorflow, it breaks our open source build. They recommend that tensorflow (so us too), patch the open source build.

We are on SLA. No internal integrates are possible unless (a) our TAP passes or (b) someone force submits the integrate change or (c) one of our skipped tests is the broken part (e.g. some of the end to end rust tests may fall under this).

Yeah, like I mentioned before, our TAP did pass though, since the BUILD file had the fix in it. That fix was just a local patch in the integrate branch, it is not in the open source LLVM project.

Someone just noted that that also broke triton's open source build, so I'm tracking what's supposed to have happened there.