[feature] Warn of known bugs in used software (bug-for-bug compatibility)
joukewitteveen opened this issue · 7 comments
Word of warning: This issue came up at an interesting talk by @annakrystalli. I have no time to help out, but she encouraged me to post this regardless.
Consider a hypothetical library X that, in version 1.0.0 contains an obscure bug where 0.5683/0
evaluates to -infinity
, in violation of IEEE754. From the perspective of the library developers, this is a silly bug and a new version is released quickly, where 0.5683/0
evaluates to +infinity
. Despite https://xkcd.com/1172/, this is seen as an improvement in adhering to documented behavior, so the new release gets version number 1.0.1 (bug fix) and not 2.0.0 (breaking change).
If we look at
rrtools/inst/templates/Dockerfile
Line 13 in 58e842e
apt-get update
breaks this.
The Bug
Ideally, rrtools
should refer to frozen repositories or otherwise limit the possible impact of apt-get update
.
The Feature Request
Perhaps not as part of rrtools
, but it would be nice if there was a tool that could take a compendium, analyze the software+version that is used in it, and warn the author(s) if any of the used software is known to contain a (numerical) bug. We do not want outcomes to be skewed by software errors, but there is very little to protect us from them.
Thanks for sharing your observation. Do you have any suggestions about how we can fix this?
Looking at some packages in the Ubuntu package repository, it appears as if old versions of packages are not removed. Maybe The Bug can be fixed by not running apt-get update
and instead revert to a known set of versions of packages.
For The Feature Request, I am not so sure. This is probably an entire project in and of itself. The logic you had probably figured out yourself already: scrape the compendium for all versions of all dependencies, check with some central database for known flaws, report.
Thanks for your reply. It looks like we may be able to do something like this, known as 'version-pinning':
RUN apt-get update && apt-get install -y \
package-foo=1.2.*
Another method for tackling this might be in the containerit packages. If this method addresses your concern, maybe we should use containerit to generate the dockerfiles here? What do you think?
As Matthias Hinz points out in the thread you linked, @benmarwick, a good, general solution of this issue is currently impossible:
The main problem is, that the repositories mostly provide the most recent version of a package only. Even if there were repositories with historic packages, we would still have to match libraries with package names and map between version-tags, which may vary depending on the platform and architecture.
As far as I understand reproducibility with docker you should save the resulting image (with docker save
) at the end of the project and not rely on the Dockerfile at all. It's a build instruction that only works if external services (software archives) are working as expected and you can't rely on them for the far future of 5-10 years (pun intended).
If the version problem you describe, @joukewitteveen, occurs as long as your still working on the project -- for example when a new software version introduces a critical bug -- then version-pinning or direct source code download (as introduced for some software in containerit) might be a solution.
But again: This is IMHO only a temporary solution until you finish your work on this project and store everything in neatly packed virtual machine image.
Let me restate that I am not currently having any problems. This issue is (to me) conceptual.
Although I agree that saving a complete image is probably the best solution for future reproducibility, this is often not very practical and many software archives maintain copies of older versions, so there may be hope. Either way, it would be beneficial if a compendium contained a list of dependencies with exact versions, for instance in a yaml file. A helper script can then try to provision the docker image with the exact versions of these dependencies (alleviating the author of this task), while another script may link the dependencies in this file to a database with known technical issues, so that a compendium can be flagged automatically if it relies on a piece of software with known problems.
Well there are at least two projects that attempt to provide a catalog of install commands for system requirements: r-system-requirements and sysreqsdb
A system like the one you describe could be added there as another layer, I guess.
I'm going to take a closer look at the containerit package to see if that can help us with this.