scala/scala-dev

do PR validation on Travis-CI (or GitHub Actions?) instead of Jenkins

SethTisue opened this issue ยท 20 comments

the groundwork for this is already laid.

the main blocker had been that partest took too long but that's fixed now, so we'll be well under the 90 minute limit the Travis-CI folks have kindly granted us.

this is now pressing since scala-ci.typesafe.com is down.

WIP on Travis-based PR validation: scala/scala#6621

in order to finish this, we'll need to get scala-ci's Artifactory back up, for publishing the PR snapshots.

we'll need to get scala-ci's Artifactory back up, for publishing the PR snapshots

it's back up.

note that it isn't only about publishing the PR snapshots for public consumption, it's also about doing like the old validate/publish-core and validate/test scripts do, the so-called mini-bootstrap: we build, publish to Artifactory, point STARR to what we just built, then run the tests.

my initial stabs at this tried to share code with the full bootstrap-and-publish job, but the bootstrap stuff is so hairy (for both inherent and historical/accidental/fixable reasons), it's going to easier to work separately from that first, then once it's working, consider removing duplication. (Lukas agrees.)

the nicest experience will be achieved if we split up the parts of testAll so they all run in parallel on Travis and produces a separate logfile. but for starters I'm just trying to run testAll and get that working.

we can't publish the PR snapshots to Artifactory because pull request jobs on Travis-CI don't have access to secure environment variables

I suggest we:

  • use publishLocal to do PR validation itself
  • worry separately about publishing to Artifactory
    • probably by adding a new /publish command to Scabot (usable only by committers) that would push the PR branch to scala/scala (under some standardized naming scheme like PR/1234) at which point the existing publish stuff would take over. (this would have the advantage of publishing fully bootstrapped rather than mini-bootstrapped artifacts)

cleaner PR at scala/scala#6630. uses publishLocal for the mini-bootstrap; doesn't otherwise publish.

I hit merge on scala/scala#6630, so remaining todos include:

  • let both forms of PR validation operate in parallel for a while and see how the Travis side does, in particular whether it catches the same failures Jenkins catches
  • see if we can cut the runtime back, perhaps by:
    • paying for premium instances
    • parallelizing the different parts of testing, by having a "build" stage copy files to S3 (Lukas has ideas on how to accomplish this securely without secrets, which PR jobs can't have) first
  • figure out a way to publish to scala-pr-validation-snapshots from Travis-CI
    • Lukas has ideas, probably involving an entirely separate job/repo, or something? can't recall details
    • Seth feels like adding an explicit /publish command to Scabot might be adequate, no strong feeling though
  • make appropriate changes to Scabot
  • integrate better with the new GitHub "checks" stuff
  • eventually disable the Jenkins jobs, remove the Jenkins scripts from the scripts directory in the scala/scala repo, and perhaps shut down a behemoth or two

we haven't been super happy with Travis-CI for PR validation. timeouts, subpar UI for log browsing and analysis, ...

the publishing-from-Travis aspect has worked fine, but with a big exception, which is that we don't have a way to publish Scala versions for as-yet-unmerged PRs, because we don't have access to the necessary encrypted environment variables from PR runs. (whereas our Jenkins setup has those secrets, in a way we don't think any PR can inappropriately access.)

today, in 2019, GitHub Actions exists and seems worth considering as well.

lrytz commented

I made a variation to run on github actions a small while back. It demonstrates running on a matrix of adoptopenjdk 8, 11 and 12 (and failing on everything but 8) https://github.com/martijnhoekstra/scala/runs/220138323

@smarter is pushing GitHub Actions. I think maybe he gets a commission ๐Ÿ’ฐ

but the Dotty folks don't have a unmerged-PR-publishing solution, scala/scala3#6145 remains open. regardless, we can surely learn from what their team is doing

as we discussed in scala/contributors Gitter today, note that Travis-CI and Jenkins test different things. Jenkins tests the PR's branch HEAD as-is. Travis-CI tests the merge commit. (though by the time we actually hit "merge", HEAD of the target branch may already have moved on, so that result may be stale.)

there are pros and cons to both.

note also that with Travis-CI you get different behavior depending on whether the PR comes from a fork or not. if the PR comes from a branch in the same repo, you get both the branch run and the merged run. but in scala/scala all PRs always come from forks. so the branch run only occurs if Travis-CI has explicitly been enabled in the fork. (except, gah, at present in scala/scala that branch run will always fail; see #663)

(Moving from scala/scala#8731 (comment))

@lrytz

It's one-time only anyway, the travis build is not re-run as the target branch moves ahead.

True. I guess given enough merge conflicts, old branches will always have to be rebased before their PRs can be merged (relevant to me as I just finished rebasing scala/scala#5892). But, otherwise, the risk here is that an old branch goes green on all its commits, but it then break once merged (semantic conflicts).

Makes me wonder if there's a relationship between how verbose/terse a language is and how frequent projects in those languages get merge conflicts. Did bors come out of c/c++/rust projects which had less merge conflicts but more semantic conflicts? ๐Ÿค”

lrytz commented

Semantic conflicts happen once in a while, recently scala/scala#8709

Yeah, and those are more common than my PR-of-an-old-branch case.

Is there a list of things somewhere I can check that the new tool must cover? Wanted to try this out for GitHub Actions

Is there a list of things somewhere I can check that the new tool must cover?

What Travis-CI currently does is in our .travis.yml, and what Jenkins does is in scripts/jobs/validate/test. There is some doc in README.md.

differences I'm aware of:

  • Windows testing is currently Jenkins-only (and uses scripts/jobs/integrate/windows, not scripts/jobs/validate/test, though the two scripts are broadly similar)
  • for no special reason, only Travis-CI currently does the extra Dotty-compatibility check sbt -Dscala.build.compileWithDotty=true library/compile
  • actual publishing of releases (including nightlies) is now only on Travis-CI
  • but only Jenkins publishes to scala-pr-validation-snapshots, as already discussed in comments above

in general, PR validation is easy-ish (the bootstrap aspect is only modestly tricky) and publishing is harder

There would be value even in just running the tests on GitHub Actions, especially if it worked on both Linux and Windows. Even if we can't retire the rest of Jenkins yet, it would be awfully nice if we could retire the Windows aspect of Jenkins (2.13, 2.12, 2.11). (At one point we were thinking of accomplishing that by adding an Appveyor build, but if GitHub Actions can do it, then great.)

If any solution can be used on all three branches (2.11.x, 2.12.x, 2.13.x), that'd be great. But there's value even in a solution that leaves out 2.11 (and/or 2.12). (For example, we didn't bother moving 2.11 release publishing off Jenkins, since there may never be another 2.11.x release.)

Wholesale changes to all of this can be daunting, but incremental change can have value too.

parallelizing the different parts of testing, by having a "build" stage copy files to S3 (Lukas has ideas on how to accomplish this securely without secrets, which PR jobs can't have) first

Dale and I got this done recently using Travis-CI's new-ish "workspaces" feature.

so the branch run only occurs if Travis-CI has explicitly been enabled in the fork. (except, gah, at present in scala/scala that branch run will always fail; see #663)

Dale fixed this recently, but the tests take long enough that it's of limited usefulness to contributors, who will use up their free allotment of minutes pretty darn fast, under Travis-CI's new limits.

closing โ€” #751 is the new ticket on this

Dale fixed this recently, but the tests take long enough that it's of limited usefulness to contributors, who will use up their free allotment of minutes pretty darn fast, under Travis-CI's new limits.

I've reverted back to travis-ci.org for the rest of the month... but I'll need a new idea in the new year.