shibatch/sleef

Restart/Fix CI tests

blapie opened this issue · 6 comments

blapie commented

We do not have access to machines previously used in CI anymore. Therefore, CI is very limited at the moment and only patches improving test coverage will be merged.

#476 adds a layer of CI based on x86 github-hosted runners and visible in Github Actions, non-x86_64 architectures are tested n qemu, x86_64 tested natively. Only ubuntu is used in tests.

This does not cover all tests that Jenkins and AppVeyor used to cover.

Follow up work is expected to make as much use as possible of github-hosted and self-hosted runners in GHA to test more architectures natively, more OSs, more compilers, ...

This is issue tracks this effort.

Note: AppVeyor.yml, Jenkinsfile and travis.yml should be removed on closure.

blapie commented

A roadmap to replace/extend pre-existing CI:

  • Components/Features (tested at least on Linux)
    • Test libm #476
    • Test DFT #476
      • Add Risc-V when supported #503
    • Test QUAD #476
      • Add Risc-V when supported #503
    • Support for OpenMP
    • Test LTO
    • Test packaging features is missing (as described in #412)
    • Test examples and nested project (and submodule) #550
  • Compiler
    • test at least with gcc latest possible (currently gcc-11 since more recent are waiting for bug fix to be propagated) #476
    • test at least with llvm/clang latest (available in OS, llvm-17 for ubuntu-latest) #477
    • have both old and new versions of gcc and clang?
      • Issue is what version to choose and how to implement nicely and keep maintenance cost low?
      • This attempt does seem quite a burden to maintain #541
    • add MSVC test on Windows
    • CUDA?
    • ICC?
  • OS
    • test at least with ubuntu (latest) #476
    • add support for more OS-es in general #493
    • add support for macOS and windows (available with github-hosted runners)
      • #543 (macOS)
      • #540 (Windows - DFT only)
      • #544 (Windows - Libm/Quad with MinGW)
    • add support iOS, Android, freeBSD, ... (might require docker) #493
      • Android requested for pyTorch #584
  • Architectures
    • test natively on x86_64 at least (github-hosted runners) #476
    • test natively on aarch64 (arm-hosted runners)
      • build and test natively on aarch64 with macOS (M1). #543
      • build and test natively on linux/ubuntu_22 (arm-hosted) #581
    • test with qemu on x86_64, arm, ppc64 and s390x #476
      • fix test with qemu on aarch64 when building with llvm #485
      • fix test with qemu on s390x #484
      • remove aarch64 cross-compilation and cross-tests #581
    • test with qemu on i386
    • test with qemu on risc-v #477
      • add support (build/test) with gcc. Requires gcc HEAD or gcc 14. #601
  • Schedule/Trigger
    • Discuss removing manual trigger and skipping expensive tests on pull_requests #547
    • Re-schedule and display status of GHA on push to master branch. #553

The table of supported environment in README tracks support/compatibility across architectures, os-es and compilers.

@blapie If you are interested in fast native CI for aarch64, CircleCI provides it for free for open source projects. I have been using it in several of my projects, e.g.:

https://app.circleci.com/pipelines/github/bluescarni/heyoka/2861/workflows/66959af0-4648-483f-b9c2-d7cc282508ef/jobs/6580

@bluescarni Thanks for the suggestion! That sounds brilliant. It turns out we have a layer of internal/private CI in which tests are run natively on AArch64 and we were waiting for github to provide its own runners to switch to native aarch64 with GHA.
If it is easy to set we might give that a go, in the meantime? What OS are available? What machine(s) does it run on?

What would be even more helpful is runners for other architectures like PPC, Risc-V, IBM/Z, ... for which we rely solely on qemu for testing. Any insight on that?

In the long run we might even rely on a layer of self hosted runners if github hosted ones are too limited. But I doubt that will include these missing architectures.

What OS are available? What machine(s) does it run on?

OS is Ubuntu, the machines are these:

https://circleci.com/docs/using-arm/

IIRC, for open source projects the largest instance available is arm.large, which at 4 cores with 16GB of RAM is fairly good.

CircleCI at the moment provides a rather generous amount of free monthly credits for open source projects (I use it regularly and never got near exhausting the available minutes). Of course, like with any "free" CI provider, conditions could change suddenly and unexpectedly...

What would be even more helpful is runners for other architectures like PPC, Risc-V, IBM/Z, ... for which we rely solely on qemu for testing. Any insight on that?

Travis CI used to have bare metal PPC64 machines in their CI offerings, I used it until a couple of years ago, but I think they do not offer free CI plans any more :(

Another potential option for PPC64 is to request access to a server from OSU Open-Source Labs:

https://osuosl.org/services/powerdev/request_hosting/

I did it and the process was quite smooth. I am not 100% sure they would be ok to host a CI service on their machines... Still, it is quite handy for interactive debugging :)

I updated the roadmap with current progress in CI testing.

Also please review our plan to re-organise the scheduling of workflows, to get more control over the usage of CPU runners. #547

@bluescarni The native CI for aarch64 is solved, we now provide arm-hosted runners. We are aware it is not future proof but that comes with some benefits, e.g. we might be able to provide a wider range of OS-es/AMIs in CI through theses runners.