kubernetes/test-infra

TestGrid Open-Sourcing Planning

michelle192837 opened this issue ยท 46 comments

ETA (7/25/19): We've got some updates! Check it below

tl;dr: If you're interested in TestGrid being open-sourced, add your comments, support, and use cases here!

ETA: While TestGrid open-source is in the works, you can add your results to testgrid.k8s.io by following the normal process. Your results will need to conform to expected results for now; I'll update when more possibilities are available.

I'm pulling together a roadmap for TestGrid development in 2019! (Since I'll be on vacation in the latter half of December, I'm drafting it this week, and the final one will get posted in January). There's been scattered support, comments, and bugs for TestGrid open-sourcing, but I want to collect all that in one bug for convenience's sake.

So! Comments! If you have details on what you need specifically (i.e. what parts are most important to prioritize, like being able to stand up a separate frontend, upload results, run your own updaters, etc.), or on what you'd be using TestGrid for, note that too. The more comments the better!

Related issues: #3324, #3323

/kind feature
/area testgrid

Talked with @michelle192837 in person today - would be great to have a quick overview of the architecture of TestGrid and current progress.

From a naive opinion it seems like we could process our test results and upload our own proto for a third party front end to use, that would need minimal effort to get us to dashboards.

๐Ÿ‘‹ I work on AKS; (likely @juan-lee and) I would be interested in using it very much like upstream for our staging edge

Btw, I am also happy to talk about getting our public face on "real" test-grid

I suspect @jackfrancis and co might be similarly interested vis-a-vis Azure/aks-engine

I would be in using testgrid internally, currently to be able to visualize the current integration tests I have setup to run against my clusters. The alerting portion if tests fail consecutively is also something I'm looking forward to having.

would testgrid go in a separate repo?

Thanks for the comments so far all! Keep adding if you've got them. ^^

@neolit123 I think ideally, since conceptually it's not k8s-specific? No idea on if/when it would be in a separate repo though.

i think i see it as a kubernetes/testgrid repo.
my concern here is to avoid the potential bump in volume of test-infra, since it's already one of the higher volume repos under the k org, after k/k and k/website.

Thanks for comments so far all! (Feel free to continue adding to this, or encourage others to).

Just got back from vacation, so I'll be pulling together the plan for early 2019 and will publish that publicly when it's more finalized.

Adding my use case here after discussion with @michelle192837/@amerai I have a bunch of clusters that running internally and am starting to run the kubernetes e2e conformance tests against them periodically. There are also internal tests added into the same framework to test for custom stuff. I'd like very much to use testgrid as the overview dashboard for these periodic tests and as an indicator of overall cluster health and compliance. A self-hostable version of testgrid would be very very appreciated (even if it initially requires setting up a private GCP buckets etc).

I'd be happy to act as beta/alpha tester for on-perm/self-hosted testgrid in case that helps!

would love to use testgrid too (in our case to vsualise all the different system tests running on different kubernetes clusters & configs in Jenkins X).

Even before the code becomes OSS, if there are public usable docker images for the testgrid containers Iโ€™d be happy to try create helm charts to make them easy to use on k8s (though I may need a hand figuring out how to wire it all together)

Just a quick update; I don't have much externally for a 2019 roadmap atm (been finishing up some work from last year, and going on vacation for 3 weeks in Feb), but I promise I'm still working on this and do intend to do some work in GitHub when I'm back in March.

Thanks again! ^^

(In the meanwhile, if someone's particularly itching to enhance TestGrid, feel free to take a look at #10701 )

We would like to enhance Prow instance for Kyma with TestGrid. The part we find most useful is a clear view of test stability: how often they fail, easy access to logs and alerting.

Sorry to pester but do we have any new info on what the status of this is? I'd really love to have testgrid available for internal use.

Sorry to pester but do we have any new info on what the status of this is? I'd really love to have testgrid available for internal use.

+1, there is little tools in e2e test results visualization, and i think testgrid is definitely helpful.

Hey everyone, and thanks for your patience! :D Here's our rough timeline for open-sourcing. This covers the next couple quarters (there will be more work to do next year, I'll update when we're further along). Plans subject to change, but feel free to poke if it looks like we're lagging on this. ^^

Q3 (July - September 2019):

  • Move TestGrid project + code into its own repo
  • Add all post-update jobs
  • Entomologist (finds issues associated with tests)
  • Summarizer (creates a summary for a dashboard tab)
  • Alerter (sends email alerts based on summaries)
  • Use open-source code in production services

Q4 (October - December 2019):

  • Add updater (gathers test results and turns into TestGrid state)
  • PubSub update requester (monitor GCS, request updates on changes)
  • Results gatherer (gathers/processes Prow's JUnit results)
  • State updater (translates processed results into state.proto)

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Has the open-sourcing progressed far enough that we can create a Testgrid tab or dashboard? We're running our tests as Kubernetes workloads that are kicked off by cronjob. We want to visualize the pass/fail history for each test. Alerting and issue creation would be nice to have but not necessary.

Still working on it! But if you just want to add your results to testgrid.k8s.io, you should be okay to do that. https://github.com/kubernetes/test-infra/blob/master/testgrid/config.md should have more info.

Can we can use TestGrid for displaying Prow Job results that are not sent to GCS ? I meant by having TestGrid query Prow's internal database of ProwJobs.

Also, I see the code in https://github.com/GoogleCloudPlatform/testgrid, wondering if it is the full source code of TestGrid, and also has the JS code as well for the dashboard ?

At the moment, nope, the results have to be uploaded to GCS to be visible to TestGrid. (This should be easy with decorated jobs, though).

The code in GoogleCloudPlatform/testgrid is still being worked on, so it's not quite the full source code yet; it's a mix of code we're migrating as well as code that lives fully-externally now. (The frontend code is also not in there)

I am planning to use Prow to run an extended system test suite on our private code with ProwJobs, and the way TestGrids dashboard displays and organizes results would be almost perfect fit for our use case. The project I am working on is closed source, so I plan to run a private cluster on GKE and run the tests with ProwJobs in the cluster.

My initial idea was to host TestGrid privately as well, but I now realized that it is not yet ready for private deployment. You mention that if we want, we can add our results to testgrid.k8s.io, could you explain what exactly would need to be made public, and would would be private in such a deployment?

Yes that part seems clear to me. What I was wondering is how much exposure would happen with the public GCS bucket. For example, would it be possible only expose pass/fail status of jobs and job names, but not logs (or private link to logs). Basically, what would be the minimal amount of public exposure required.

Also, I assume ones TestGrid is fully opensourced, one could host and run it completely privately?

I think it's possible to expose only pass / fail and entries but I don't know that we're really interested in hosting infrastructure for this purpose ... @spiffxp @fejta @michelle192837

Nominally it's public ~~ for the Kubernetes project, with some other OSS projects displaying their results as well. Currently I don't think we support any closed source projects...

Once fully open sourced I don't see why you couldn't run a private instance.

Theoretically, I think you could have something obfuscate your private results sufficiently for you to be confident in uploading them publicly. The public logs need enough information to show up in TestGrid (as outlined in https://github.com/kubernetes/test-infra/tree/master/gubernator#job-artifact-gcs-layout). But beyond that, you don't need to add logs or anything, you'll just have less useful information (and you won't have the automatic support of Prow's decorated jobs uploading all the related logs and files for you; you'd need some obfuscation + upload step(s)).

Thanks a lot, really appreciate it. I guess my last question would be an expected timeline for an alpha version coming out. This link above seems to go part of the way there.

I note that while the TestGrid concept of hierarchically organizing test results and displaying historical trends of these results seems very general, most of the current users seem to be serving k8s community itself. I work with the PX4 opensource autopilot community (and their closed source adopters), and am developing tools to run end-to-end system tests involving various component repositories (the AutoPilot(itself composed of many different componets), flight simulator, ground control software, payload interface etc) and using Prow to execute these tests, and TestGrid to provide the overview & status makes a lot of sense to me. Unless I am missing something in the details (still haven't completely onboard myself), Prow and TestGrid could find a lot of traction in this community.

Heh, that link is definitely a bit out of date. We're continuing to work on open-sourcing, but we'll announce more details on the repo itself more likely (https://github.com/GoogleCloudPlatform/testgrid).

That sounds awesome, yeah! There is an updater that's being worked on in GoogleCloudPlatform/testgrid, if you want to take a look; that might give some more insight into how we parse test formats and whatnot. (Just to be clear, the updater in GoogleCloudPlatform/testgrid is not what does the updates in testgrid.k8s.io at the moment, but we're working on making it that, and it's still useful to look at in the meanwhile.)

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

aojea commented

/remove-lifecycle stale
/cc

+1 to opensource

Some use case I would like is to have an option to open a dashboard (like https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-default) and save that as a png/jpg image, maybe passing a URL param, eg https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-default?image=png

thanks!

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

Do we have S3 support in the roadmap ? Currently, looks like Testgrid works only with GCS.
https://github.com/GoogleCloudPlatform/testgrid/blob/master/util/gcs/gcs.go#L94

Not at the moment; it's not something we have the bandwidth to implement, but contributions are welcome in that respect! I'd recommend opening an issue in https://github.com/GoogleCloudPlatform/testgrid itself to track, and tagging with help-wanted.

Hi @michelle192837 - I have a couple questions out of curiosity, as I'm working with knative community.

What is the current status of testgrid's open sourcing? Can it be run fully with the code in https://github.com/GoogleCloudPlatform/testgrid? Or is it still partially closed-source? Is there an ETA / target date / estimate for when it may be fully out?

/sig testing

Hi @michelle192837 - I have a couple questions out of curiosity, as I'm working with knative community.

What is the current status of testgrid's open sourcing? Can it be run fully with the code in https://github.com/GoogleCloudPlatform/testgrid? Or is it still partially closed-source? Is there an ETA / target date / estimate for when it may be fully out?

The backend can be run completely in open-source on kubernetes. https://github.com/GoogleCloudPlatform/testgrid/blob/master/standalone.md#setting-up-testgrid has specific details on this, with some examples.

Can the readme update to reflect the latest status of the open sourcing initiative? It still points to this PR.

Is there a plan to open-source the Testgrid frontend? I think that would be useful for other projects as well.