cnti-testcatalog/testsuite

Urgent: Move off CNCF Equinix Resources

Opened this issue · 15 comments

Describe the software update
Need to move cnf-testsuite out of the CNCF Equinix donated resources

Describe why update is needed
Equinix has changed the policy so only CNCF projects are eligible to use the donated resources

Additional context

  • #1777 - number of resources were lowered in January
  • need to decide on where to remove the remaining resources

Tasks

  • Determine replacement for Equinix resources
  • Start trial at UNH Labs as a Service
  • CI migration to UNH
    • Setup Github action runner on UNH lab system
    • Update Github action configuration for testsuite to point to UNH runner
    • Disable use of Equinix CI runner
    • Delete Equinix CNTi CI runner
  • remote dev system
    • backup remote dev box
    • setup new remote dev box on UNH
    • migrate over any needed date from equinix dev box
    • Delete Equinix dev box

@rannyh LFN and/or community donated resources will need to be allocated for the CNTi project needs. This includes the CI pipeline for GitHub actions.

There were some ideas put forward in Fall of 2023 and the earlier part of 2024 included Circle CI for something cost-effective. There are also some current LFN resources that have been mentioned.

If something can be identified soon, even for temporary use, then the current CI can be worked on to move over with limited interruption. Otherwise the CI will be shutdown until something is found which will be a slow down on integrated contributions (eg. Pull Requests specifically will have a big obstacle).

cc: @martin-mat @Smitholi67 @wavell

Next steps:

  • Vulk Prepare Hardware requirements for runners (send to LJ)
    • what we use now (non-optimal)
    • What is optimal
    • Short-term suggestions
    • Longer-term suggestions
  • send requirements/recommendations to LJ
  • LJ will send requirements/recommendations to LF IT

Suggested resource requirements for the CNTi project CI runners, based on Equinix machine types:

  • Ideal: 7x c3.small.x86 - increases the speed of testing and allows some concurrent testing
  • Dynamic: 6x c3.small.x86 - dynamic, slower but probably better than we have now for much less cost.
  • Adequate: 3 x c3.small.x86 - what is currently in use. hours delay for each test

The Equinix c3.small.x86 is a legacy on-demand server with the following hardware configuration:

  • 1 x Intel® Xeon® E-2278G
  • 8 cores @ 3.40 GHz
  • 2 x 480 GB SSD
  • 32 GB RAM

Circle CI will probably be the most cost effective for a hosted solution.

To be discussed/reviewed:

  1. LF IT came back with; GitHub increased the base size of their free runners early this year. Is it possible the increases will allow use of the free GitHub runners?

  2. LFN has a “Lab as a Service” (LaaS) program that is run out of the University of New Hampshire. As an LFN initiative CNTi runners can potentially be hosted under the LFN LaaS program.
    This is from the LaaS lead at UNH: The specs of the servers in LaaS are a bit larger than the ones listed on Github issues, except for the processor generation, where the LaaS servers have Intel Broadwell generation Xeon CPUs. That would only matter if the testing actually requires instruction sets that were available in that generation (i.e. AVX512).

@taylor @wavell @denverwilliams @agentpoyo @HashNuke @martin-mat please review the options above

  1. Free GitHub runners
    They can only be used if test coverage is disabled/removed, including some telecom + networking-focused tests. This will lead to reduced velocity as a result of bugs and other issues as the project moves forward.

  2. “Lab as a Service” (LaaS) program from the University of New Hampshire
    This should work for the CI system and may allow increasing velocity with more available resources.

  3. CircleCI -- $200-$300/month
    Another recommended alternative covering velocity requirements

@lilluzzi when will the LaaS systems be available to start using and migration?

Next Step:

  • @denverwilliams @taylor Start trial with “Lab as a Service” (LaaS) program from the University of New Hampshire

Currently testing UHN systems. Not sure which type should be used nor how many at this point.

@lilluzzi @rannyh FYI,
Currently testing UHN systems.

Earlier today the CNCF lab Equinix systems used for this project were stopped. They have been turned back on to allow the CI runners to work and builds to continue. I expect they will be shutdown sooner than later.

@denverwilliams the following systems are ready for further testing

  • 10.200.141.136
  • 10.200.142.204

Mon, May 20 update: Denver is not able to access the systems above via VPN as expected. Debugging with UNH support in progress.

There have been continued issues with using a docker + kind setup directly on the system. As a result we have switched to setting up a VM setup for the docker + kind runner setup which will allow expanded usage of the larger resources on the UNH system.

Vagrant setup for GitHub Action runners is in progress and expected to work in the next 24 hours. After this automated spec testing should speedup.

Suggested resource requirements for the CNTi project CI runners, based on Equinix machine types:

  • Ideal: 7x c3.small.x86 - increases the speed of testing and allows some concurrent testing
  • Dynamic: 6x c3.small.x86 - dynamic, slower but probably better than we have now for much less cost.
  • Adequate: 3 x c3.small.x86 - what is currently in use. hours delay for each test

The Equinix c3.small.x86 is a legacy on-demand server with the following hardware configuration:

  • 1 x Intel® Xeon® E-2278G
  • 8 cores @ 3.40 GHz
  • 2 x 480 GB SSD
  • 32 GB RAM

@martin-mat