KTH/devops-course

Continuous integration

Opened this issue · 45 comments

Wikipedia references:

Jenkins and Travis are examples of build tools that automatically build release candidates from the version control system and uploads them to the artifact repository.
Often unit tests are also executed in the build tool, to ensure that no bugs are included in the builds.

+1000! CI will likely be the topic of lecture #1

The decision between TravisCI and Jenkins is pretty radical.

TravisCI (as well as the new "kid on the block" CircleCI) is a managed solution and can be used for free by students through the GitHub Student Developer Pack also on private repositories. The configuration use a custom style in a .yml file. Documentation is good and configuration easy: it just works. However it is not very customisable.

Jenkins is a old and huge open source CD/CI tool built in Java with thousands of plugins. Needs to be managed on premise (i.e. installed and maintained on KTH servers). Configuration is complex and can be done through interface (both traditional one and Blue Ocean) but lately is becoming predominant the mindset of writing CD/CI pipelines as code (Jenkinsfile based on Groovy) for portability. Jenkins can be hard and require a lot of time in management. As many open source projects documentation is not that great because developers are expected to know how to do it.

At the end the most important thing is make students understand how these tools can be used with other technologies (version control, management tools, containers, etc.) to streamline and automate processes. Maybe the first lecture is not the best moment to introduce them; it's often the case that one shows how things (e.g. deployment) can be done manually and then it becomes obvious the need for such tools.

At the end the most important thing is make students understand how these tools can be used with other technologies (version control, management tools, containers, etc.) to streamline and automate processes

Agree, this is an essential aspect of DevOps, and one of the primary intended learning outcome.

The CI "engine" (Travis, Circle, Jenkins, GCB, Bamboo, QuickBuild, TeamCity, ...) is only one part of the CI question.

The key questions are:

  • Being able to properly identify the root causes for failures (Failure being "clear" (pass/fail), or fuzzy (performance regression, tests flakiness, ...))
  • Being able to report the right failures to the right person when faced with hundreds of thousands of test results, megabytes of log files, and thousands of commits per day.
  • The "broken master" problem and how to avoid it (ie tests fail on master because of logically conflicting changes)
  • Avoiding the "Christmas tree" effect (everything turns red).
  • Having a super stable and maintainable CI environment.
  • Balancing stability with speed with(out) ephemeral agents. What level of caching, and where.

I think the 2 questions are often interconnected. If you use a hosted CD/CI tool like Travis or Circle you have often very little control on notification and caching strategies. Also how the real machines are setup is often hidden. On the contrary Jenkins gives all the needed customisations with a high cost in maintenance. Also it is often the case that Jenkins becomes a top security concern since is one of the largest source of vulnerabilities.
Yet all the questions above are very interesting.

Regarding the last one I would say that most platforms are moving towards the idea of using heavily kubernetes to manage executors or at least ephemeral agents. Also most modern CD/CI infrastructure build software as containers that are then deployed to some registry. In this case there are a lot of interesting question regarding the Docker-in-Docker problem.

Some other questions that may make sense to address, depending on the level of the students:

  • Backend (typically linux) versus frontend CI (all sorts of platforms). Why backend is so much simpler. We need to make sure students understand that not everything is Linux based.
  • Effects of CI duration on development velocity.
  • Merge queues, merge trains on "master".
  • Effects of CI flakiness on bisecting algorithms, on load, on user satisfaction, on velocity

Very original CI:
The List is the Process: Reliable Pre-Integration Tracking of Commits on Mailing Lists
https://arxiv.org/abs/1902.03147

An empirical study of the long duration of continuous integration builds
http://link.springer.com/10.1007/s10664-019-09695-9

The impact of continuous integration on other software development practices: a large-scale empirical study
https://par.nsf.gov/servlets/purl/10063078

Tool list for demo/presentation:

Jenkins, TravisCI, QuickBuild, TeamCity, Concourse, CircleCI, Gitlab, etc.

Continuous Integration Theater
https://arxiv.org/abs/1907.01602

Jenkins-stargate is a unit test automation framework for all your jenkins pipeline code such as jenkins shared libraries and Jenkinsfiles.
https://github.com/swedbank/jenkins-stargate

Meetup on Feb 12 2020: https://www.meetup.com/DevOps-Stockholm/events/268502971/

Bazel, Build and test software
https://bazel.build/

game.ci
Continuous Integration tools for game development. Build and test Unity projects.

How Do Software Developers Use GitHub Actions to Automate Their Workflows?.
http://arxiv.org/pdf/2103.12224

Buildkite is a platform for running fast, secure, and scalable continuous integration pipelines on your own infrastructure.
https://buildkite.com/

"CI/CD Pipelines Evolution and Restructuring - A Qualitative and Quantitative Study." https://dblp.org/rec/conf/icsm/ZampettiGBP21

A relevant and hard topic in CI: handling test flakiness

Recent paper
A Large-Scale Longitudinal Study of Flaky Tests
https://dl.acm.org/doi/pdf/10.1145/3428270

A hard and essential topic for CI for games: defect prediction
For example:
Investigating the Practicality of Just-in-time Defect Prediction with Semi-supervised Learning on Industrial Commit Data
https://www.diva-portal.org/smash/get/diva2:1336751/FULLTEXT02.pdf

Faster builds with highly par­al­lel GitHub Actions
https://rnorth.org/faster-parallel-github-builds/

Earthly is a build automation tool from the same era as your code. It allows you to execute all your builds in containers. This makes them self-contained, repeatable, portable and parallel.
https://docs.earthly.dev/

Perforce plugin for Jenkins
https://github.com/jenkinsci/p4-plugin

Tekton is an open-source framework for creating CI/CD systems
https://tekton.dev/

On the Use of GitHub Actions in Software Development Repositories
https://decan.lexpage.net/files/ICSME-2022.pdf

Turbo is an incremental bundler and build system optimized for JavaScript and TypeScript, written in Rust.
https://turbo.build/

Analyzing the Effects of CI/CD on Open Source Repositories in GitHub and GitLab
http://arxiv.org/abs/2303.16393

SoK: Machine Learning for Continuous Integration.
http://arxiv.org/abs/2304.02829

T-Evos: A Large-Scale Longitudinal Study on CI Test Execution and Failure
http://ieeexplore.ieee.org/document/9933015

The CI/CD Collective at stackoverflow: https://stackoverflow.com/collectives/ci-cd

Kubernetes controller for GitHub Actions self-hosted runners
https://github.com/actions/actions-runner-controller

Chronicles of CI/CD: A Deep Dive into its Usage Over Time
https://arxiv.org/abs/2402.17588

Martin Fowler's post on continuous integration, 2024

Keeping master green at scale
https://sundaram.io/slides/submitqueue.pdf

Developer-Applied Accelerations in Continuous Integration 2024
https://rebels.wwwtest1.cs.uwaterloo.ca/papers/ase2024_yin.pdf

See also papers from the same group: https://rebels.cs.uwaterloo.ca/publications.html