Integration testing on master
foxish opened this issue · 24 comments
Elaborating on #545 (comment)
We need to separate our integration testing out - so we can make it run any arbitrary branch - including master.
cc @apache-spark-on-k8s/contributors @ifilonenko @erikerlandson @ssuchter @kimoonkim
are you thinking of factoring the IT into a separate repo?
I think that might be one way. If we just have the integration tests be built separately, as a project by itself, and then invoke it against master. It might not be the most expedient - open to other suggestions as well.
One advantage of that would be that it would be convenient to make separate branches, having different IT subsets targeted to various upstreaming stages of functionality
Would we then have to re-integrate them into the upstream so it has the correct IT?
It might be desirable for upstream also to have the same decoupled integration testing, similar to YARN/Mesos.
+1, that makes it a two-fer
maybe even a 3-fer, it keeps more code out of the main upstream PRs
cc @felixcheung
Having it outside of the main repo is easier and faster to iterate, which is why we did it also for Mesos..
I added a tracking issue for jenkins-infra: https://github.com/ucbrise/jenkins-infra/issues/86
when you say decoupled testing, same as YARN/mesos, do you mean the test and source will be completely outside of the Apache Spark git repo?
(because as far as I can see, there is no open source YARN integration tests for Spark, for example)
I agree with the separate repo. However, this will require a group of separate group to maintain and monitor this repo as well.
If it's about fast iteration, I'd say go for it.
At some point though, I'd agree with @ssuchter OSS (or even, under ASF) and same branch/keep in-sync is very important.
I agree - we shouldn't look at it as a separate thing long term. It's expedient for now to separate it out - to get some confidence in upstream by running tests against it. Eventually, it should live alongside the rest of the code upstream, but I do think it needs some hardening before that, and we can't hit that for 2.3
We probably want to use the testing version provided by #521. There's one more change required to make this pass on Amplab Jenkins. I'll patch this branch and bring it up to speed with our mainline, then we should think about how we're going to move the code over.
Automation is something to consider as well. We probably want to run these tests in the separate repository on every commit to master that affects Kubernetes. We could run the suite on every commit to master in general as well.
I think we can break the integration testing out further after #521 is merged.
The immediate next steps I see are:
- Separate integration testing code into https://github.com/apache-spark-on-k8s/spark-integration.
- Enable generating a JAR from it, which can take a parameter for a runnable spark distribution to test.
- Have Jenkins build a spark runnable distribution, and then invoke our integration test jar to test it.
For now, it would still depend on minikube. In the future, we can do even better.
- Separate out the image building to use a registry. In minikube, we'll build and push to the minikube local registry which can be enabled as an addon.
- Enable the integration-testing JAR to take additional parameters for K8s cluster, docker registry, and runnable spark distro.
This would enable running on other k8s clusters, enabling for example, stress testing on GKE if we want to do that, and also let other users test the release on their own clusters using our integration tests.
Are we still planning to run integration tests in the Jenkins PR builder on master? Or was the plan to run the tests in our own automated system?
Not sure if we'll get that done in time.
With this split, even if we don't get the integration tests checked in for Spark 2.3 or the spark prb changes, we'd still have the option to run it ourselves.
@foxish I was thinking about this a little more and am concerned that this setup isn't too intuitive. (edit: the proposed setup being #568 (comment))
What these semantics are basically saying is that the main repository depends on the tests. This communicates that the main repository is checking the correctness of the tests, not the other way around.
To illustrate this point, consider the situation where a developer writes a new integration test for a feature they just merged into master. Consider the workflow:
- Developer writes unit tests and main feature in apache/master. They don't update the integration test version because integration tests aren't published yet. The change is merged.
- Developer writes and publishes integration tests that are broken for the feature.
- Developer updates main repository's integration test dependency to try the new tests.
- Developer finds the tests are wrong, so they have to publish a new integration test version.
This sequence of events makes it clear that the main repository is determining whether or not the integration tests are correct. But the more intuitive mental model is that the integration tests are validating the correctness of a given Spark artifact.
Consider also the situation when the main repository is changed such that the assumptions made by the given integration test version are invalid. In this workflow, the developer is blocked from merging into master before the new integration test version is published. But in between the time that the new integration tests are published and the main change merges, that new integration test version is only applicable for a version of the main repository that doesn't exist yet.
So I think we want the integration tests to pull down and test the repository under test, not the other way around.
So I think we want the integration tests to pull down and test the
repository under test, not the other way around.
I agree with that @mccheah. I wasn't proposing the opposite but I guess it got confusing since I went into a bit of the implementation as well in my comment. I don't think we want to "publish" the integration test jar as a separate entity. My point was that jenkins could take care of building the latest integration tests (from wherever they live, which may be for now - https://github.com/apache-spark-on-k8s/spark-integration), and the distro (from a particular PR), and then have the integration tests run on the distro. The point about jars was just to make the separation point and isn't really a necessity.
I don't mind having the integration tests actually pull down the repo and build it for now if it helps save time. I think it's the same whether jenkins does, or if we have the test logic itself do it.
Ah, sorry for the misunderstanding there. I think we want the integration tests to target a git hash on either a remote repository to clone or a local checkout for development iteration. But maybe we should support testing against Spark distribution tarballs also because we can save build time by using a pre-published Spark distribution instead of building it ourselves in the integration test CI.