Continuous integration
Opened this issue · 45 comments
Wikipedia references:
- https://en.wikipedia.org/wiki/Continuous_integration
- https://en.wikipedia.org/wiki/Build_automation
- https://en.wikipedia.org/wiki/Monorepo
Jenkins and Travis are examples of build tools that automatically build release candidates from the version control system and uploads them to the artifact repository.
Often unit tests are also executed in the build tool, to ensure that no bugs are included in the builds.
+1000! CI will likely be the topic of lecture #1
The decision between TravisCI and Jenkins is pretty radical.
TravisCI (as well as the new "kid on the block" CircleCI) is a managed solution and can be used for free by students through the GitHub Student Developer Pack also on private repositories. The configuration use a custom style in a .yml
file. Documentation is good and configuration easy: it just works. However it is not very customisable.
Jenkins is a old and huge open source CD/CI tool built in Java with thousands of plugins. Needs to be managed on premise (i.e. installed and maintained on KTH servers). Configuration is complex and can be done through interface (both traditional one and Blue Ocean) but lately is becoming predominant the mindset of writing CD/CI pipelines as code (Jenkinsfile
based on Groovy) for portability. Jenkins can be hard and require a lot of time in management. As many open source projects documentation is not that great because developers are expected to know how to do it.
At the end the most important thing is make students understand how these tools can be used with other technologies (version control, management tools, containers, etc.) to streamline and automate processes. Maybe the first lecture is not the best moment to introduce them; it's often the case that one shows how things (e.g. deployment) can be done manually and then it becomes obvious the need for such tools.
At the end the most important thing is make students understand how these tools can be used with other technologies (version control, management tools, containers, etc.) to streamline and automate processes
Agree, this is an essential aspect of DevOps, and one of the primary intended learning outcome.
The CI "engine" (Travis, Circle, Jenkins, GCB, Bamboo, QuickBuild, TeamCity, ...) is only one part of the CI question.
The key questions are:
- Being able to properly identify the root causes for failures (Failure being "clear" (pass/fail), or fuzzy (performance regression, tests flakiness, ...))
- Being able to report the right failures to the right person when faced with hundreds of thousands of test results, megabytes of log files, and thousands of commits per day.
- The "broken master" problem and how to avoid it (ie tests fail on master because of logically conflicting changes)
- Avoiding the "Christmas tree" effect (everything turns red).
- Having a super stable and maintainable CI environment.
- Balancing stability with speed with(out) ephemeral agents. What level of caching, and where.
I think the 2 questions are often interconnected. If you use a hosted CD/CI tool like Travis or Circle you have often very little control on notification and caching strategies. Also how the real machines are setup is often hidden. On the contrary Jenkins gives all the needed customisations with a high cost in maintenance. Also it is often the case that Jenkins becomes a top security concern since is one of the largest source of vulnerabilities.
Yet all the questions above are very interesting.
Regarding the last one I would say that most platforms are moving towards the idea of using heavily kubernetes to manage executors or at least ephemeral agents. Also most modern CD/CI infrastructure build software as containers that are then deployed to some registry. In this case there are a lot of interesting question regarding the Docker-in-Docker problem.
Some other questions that may make sense to address, depending on the level of the students:
- Backend (typically linux) versus frontend CI (all sorts of platforms). Why backend is so much simpler. We need to make sure students understand that not everything is Linux based.
- Effects of CI duration on development velocity.
- Merge queues, merge trains on "master".
- Effects of CI flakiness on bisecting algorithms, on load, on user satisfaction, on velocity
Very original CI:
The List is the Process: Reliable Pre-Integration Tracking of Commits on Mailing Lists
https://arxiv.org/abs/1902.03147
An empirical study of the long duration of continuous integration builds
http://link.springer.com/10.1007/s10664-019-09695-9
The impact of continuous integration on other software development practices: a large-scale empirical study
https://par.nsf.gov/servlets/purl/10063078
Pipes for Bitbucket Cloud
https://bitbucket.org/blog/meet-bitbucket-pipes-30-ways-to-automate-your-ci-cd-pipeline
Tool list for demo/presentation:
Jenkins, TravisCI, QuickBuild, TeamCity, Concourse, CircleCI, Gitlab, etc.
Visualize the Jenkins build
https://github.com/jenkinsci/yet-another-build-visualizer-plugin
Continuous integration at google scale
https://www.eclipsecon.org/2013/sites/eclipsecon.org.2013/files/2013-03-24%20Continuous%20Integration%20at%20Google%20Scale.pdf
Continuous Integration Theater
https://arxiv.org/abs/1907.01602
Jenkins-stargate is a unit test automation framework for all your jenkins pipeline code such as jenkins shared libraries and Jenkinsfiles.
https://github.com/swedbank/jenkins-stargate
Meetup on Feb 12 2020: https://www.meetup.com/DevOps-Stockholm/events/268502971/
Bazel, Build and test software
https://bazel.build/
game.ci
Continuous Integration tools for game development. Build and test Unity projects.
How Do Software Developers Use GitHub Actions to Automate Their Workflows?.
http://arxiv.org/pdf/2103.12224
Buildkite is a platform for running fast, secure, and scalable continuous integration pipelines on your own infrastructure.
https://buildkite.com/
"CI/CD Pipelines Evolution and Restructuring - A Qualitative and Quantitative Study." https://dblp.org/rec/conf/icsm/ZampettiGBP21
A relevant and hard topic in CI: handling test flakiness
Recent paper
A Large-Scale Longitudinal Study of Flaky Tests
https://dl.acm.org/doi/pdf/10.1145/3428270
A hard and essential topic for CI for games: defect prediction
For example:
Investigating the Practicality of Just-in-time Defect Prediction with Semi-supervised Learning on Industrial Commit Data
https://www.diva-portal.org/smash/get/diva2:1336751/FULLTEXT02.pdf
Faster builds with highly parallel GitHub Actions
https://rnorth.org/faster-parallel-github-builds/
Earthly is a build automation tool from the same era as your code. It allows you to execute all your builds in containers. This makes them self-contained, repeatable, portable and parallel.
https://docs.earthly.dev/
an amazing github action
https://github.com/TejasvOnly/random-rickroll
Perforce plugin for Jenkins
https://github.com/jenkinsci/p4-plugin
Tekton is an open-source framework for creating CI/CD systems
https://tekton.dev/
On the Use of GitHub Actions in Software Development Repositories
https://decan.lexpage.net/files/ICSME-2022.pdf
Turbo is an incremental bundler and build system optimized for JavaScript and TypeScript, written in Rust.
https://turbo.build/
Analyzing the Effects of CI/CD on Open Source Repositories in GitHub and GitLab
http://arxiv.org/abs/2303.16393
SoK: Machine Learning for Continuous Integration.
http://arxiv.org/abs/2304.02829
T-Evos: A Large-Scale Longitudinal Study on CI Test Execution and Failure
http://ieeexplore.ieee.org/document/9933015
The CI/CD Collective at stackoverflow: https://stackoverflow.com/collectives/ci-cd
Kubernetes controller for GitHub Actions self-hosted runners
https://github.com/actions/actions-runner-controller
Chronicles of CI/CD: A Deep Dive into its Usage Over Time
https://arxiv.org/abs/2402.17588
Martin Fowler's post on continuous integration, 2024
Keeping master green at scale
https://sundaram.io/slides/submitqueue.pdf
Developer-Applied Accelerations in Continuous Integration 2024
https://rebels.wwwtest1.cs.uwaterloo.ca/papers/ase2024_yin.pdf
See also papers from the same group: https://rebels.cs.uwaterloo.ca/publications.html