Repoctl is backing-up files without any good-reason
Closed this issue ยท 39 comments
Hi @cassava,
I just lot the entire repository twice today, the files were moved to the backup directory.
Tasks used includes (only) repoctl update
and repoctl add <some kernels>
.
I'm using the 0.21 release.
This time repoctl --debug status -mca
doesn't show anything wrong ๐ข
So sorry! I'm looking into it.
Luckily I logged the moment it backed up everything this second time:
Copying and adding to repository: linux-tkg-pds-bobcat-5.8.7-12-x86_64.pkg.tar.zst{,.sig}
Adding package to database: /srv/http/chaotic-aur/x86_64/linux-tkg-pds-bobcat-5.8.7-12-x86_64.pkg.tar.zst
error: read package /srv/http/chaotic-aur/x86_64/linux-tkg-pds-bobcat-5.8.7-12-x86_64.pkg.tar.zst: invalid input: magic number mismatch.
Could you post the output of repoctl version
?
repoctl version
repoctl version 0.21 (30 August, 2020)
Copyright 2016-2020, Ben Morgan <cassava@iexu.de>
You may find repoctl on the Internet at
https://github.com/cassava/repoctl
Please report any bugs you may encounter.
The source code of repoctl is licensed under the MIT license.
Current configuration:
columnate = false
color = "auto"
quiet = false
current_profile = "default"
default_profile = "default"
[profiles.default]
repo = "/srv/http/chaotic-aur/x86_64/chaotic-aur.db.tar.zst"
add_params = []
rm_params = []
ignore_aur = []
require_signature = false
backup = true
backup_dir = "/srv/http/chaotic-aur/archive/"
interactive = false
pre_action = ""
post_action = ""
That looks like the output of the bug that already got fixed... hmm. I wonder if repoctl-git
version is different from the one I packaged.
๐ this was with aur.archlinux.org/packages/repoctl
Oh boy...
I got it from the pacman's cache:
https://lonewolf.pedrohlc.com/.hidden/repoctl-0.21-1-x86_64.pkg.tar.zst
Ah, you might consider using repoctl-0.21-3
.
I may have missed updating it ๐ I'll have to wait for the recompilation cycle to get in letter R.
My guess is that might be it. I might have messed up the PKGBUILD for the go module migration, which could result in local Go modules being used instead of the vendored ones... This was before I followed the updated Arch Go packaging guidelines for modules.
And the error you describe here is one that got fixed in one of the dependencies of repoctl. So there might be that mismatch.
Also, I downloaded the package that caused the trouble and followed your procedure locally and didn't have any trouble, so at least I can't reproduce it with 0.21-2 and 0.21-3.
One notice, when magic number mismatch
happens with repoctl update
no files are backed up, it just fails, when it happens with repoctl add
it's catastrophic.
And another thing: What happens when the file didn't finish writing? I have some async tasks and they may be trying to add files before they finished writing...
EDIT: I've updated to 0.21-3
, I'll keep you posted...
Oh interesting, I should look into that. I think I'm starting to understand the backup behavior. Has to do with how repoctl reads all data first and then tries to act on it.
I think I need to add an separate use-case for "new file exists and I can't read it".
Because I did not consider this originally. This would solve both problems at once.
Also, a partially-written archive would pose some problems, because repoctl only reads as much of a package as it needs to; currently I'm relying on repo-add to handle the case of an incomplete file.
Yeah, repo-add
failing and consequentially repcotl
exiting with a failure code too is enough. That's how it has been working the past year. And how it would work with pure repo-add
too. The server will reattempt it later on failures...
So one thing I could definitely do is have repoctl add
verify the packages before copying them to repository. Currently it just copies them over and trusts in repo-add
.
Luckily I logged the moment it backed up everything this second time:
Copying and adding to repository: linux-tkg-pds-bobcat-5.8.7-12-x86_64.pkg.tar.zst{,.sig} Adding package to database: /srv/http/chaotic-aur/x86_64/linux-tkg-pds-bobcat-5.8.7-12-x86_64.pkg.tar.zst error: read package /srv/http/chaotic-aur/x86_64/linux-tkg-pds-bobcat-5.8.7-12-x86_64.pkg.tar.zst: invalid input: magic number mismatch.
And you're saying that adding this package to database actually caused all the packages in the repository to be backed up?
Alright then that change is now on master with 4822d1f.
Moved to it ๐
By the end of the recompilation cycles, I'll let you know if something goes wrong.
After changing to -3
no error happened yet!
And you're saying that adding this package to database actually caused all the packages in the repository to be backed up?
Yeah, The full logs have a bunch of "Backing up..." after this
The full log:
https://lonewolf.pedrohlc.com/.hidden/downinahole.log
So bizarre that I can't get that backup-behavior reproduced at all... :-(
I was using 0.21 since #57 was closed, and only now it happened (and then again, but after building 800 packages in a small-time period).
I think that sounds like a race condition issue.
Do you run repoctl update while building other packages?
I do, the server has 40 vCPUs, I don't like to leave any of these idle ๐, so it's a chaos of down
, update
, and when files come from a third cluster add
.
Ok... that explains a lot. ๐
This would have been very useful to know earlier. So far I haven't considered the ramifications of parallel updates and adds. This is a tricky one.
Actually I'd also be interested in hearing any pain-points you might have in building that many packages.
For example: I've always found the situation difficult where you need to build newer dependencies that then need to be installable for the next makepkg -s
command.
I think it would be better to create a new issue specifically for the use-case "Support parallel execution of repoctl".
๐
someway somehow I managed that, my first infra has a "batch" command, and I execute it like this:
chaotic-batchbuild somepackage anotherpackage -- apackagethatdepends
and I wrapped the "add" command, it waits for a lock to be deleted before running a secondary repoctl update
(And the second one has a db-bump
command that does almost the same)
Sometimes I still get:
error: read package /srv/http/chaotic-aur/x86_64/hamsket-git-r1222.fe82ff7-1-x86_64.pkg.tar.zst: invalid input: magic number mismatch.
Adding package to database: /srv/http/chaotic-aur/x86_64/gstreamer0.10-base-0.10.36-13-x86_64.pkg.tar.zst
Adding package to database: /srv/http/chaotic-aur/x86_64/gstreamer0.10-base-plugins-0.10.36-13-x86_64.pkg.tar.zst
Sometimes is uglier:
error: read package /srv/http/chaotic-aur/x86_64/gnome-shell-extension-xrdesktop-git-0.14.0.29.9c5c0c3-1-any.pkg.tar.zst: cannot find file ".PKGINFO".
error: read package /srv/http/chaotic-aur/x86_64/mkinitcpio-openswap-0.1.0-3-any.pkg.tar.zst: cannot find file ".PKGINFO".
error: read package /srv/http/chaotic-aur/x86_64/pango-anydesk-1:1.43.0-3-x86_64.pkg.tar.zst: invalid input: magic number mismatch.
error: read package /srv/http/chaotic-aur/x86_64/perl-authen-simple-0.5-9-any.pkg.tar.zst: cannot find file ".PKGINFO".
error: read package /srv/http/chaotic-aur/x86_64/qomui-git-0.8.2.r22.23650ab-1-x86_64.pkg.tar.zst: invalid input: magic number mismatch.
error: read package /srv/http/chaotic-aur/x86_64/ripcord-arch-libs-0.4.26-1-x86_64.pkg.tar.zst: invalid input: magic number mismatch.
error: read package /srv/http/chaotic-aur/x86_64/woeusb-ng-0.2.5-3-any.pkg.tar.zst: invalid input: magic number mismatch.
Adding package to database: /srv/http/chaotic-aur/x86_64/tpmmanager-0.8.1-8-x86_64.pkg.tar.zst
But these packages are still being added to the repo...
As the catastrophic event seems to have ceased, I'm closing this issue.
Hey @PedroHLC, the errors you are seeing there are to be expected when repoctl reads tar.zst
files that are still being written.
cannot find file ".PKGINFO"
happens when the Zst decompression is successful far enough that the TAR reader can start processing the archive, but it doesn't find the .PKGINFO
file that supposed to be in the TAR.
invalid input: magic number mismatch
happens when the Zst decompression fails because not enough of the file has been written.
If repoctl encounters these files, it should just ignore them.
Further final thoughts from me:
- Since this is hard to replicate, one way to reproduce this might be to truncate files at a certain number of bytes.
- Optimally, files that are in the process of being written or copied should be given an extension that repoctl ignores.
Quick question: Do you run repo add
and repo update
in parallel?
Do you run repo add and repo update in parallel?
I observed it today, and repoctl is not running in parallel. My lock wrapper is working and probably has been the way the entire past year. I just don't avoid partially written files. But I'm considering using the same lock file for the copying operations...
Sadly it happened once more, with repoctl add
(and not running parallel).
Good to know that it can also happen by itself! Debugging data-race issues are really really hard, because a lot of behavior is just undefined, which can mean basically anything. But if it happens without any other instance running in parallel, then I might just have a chance to observe it myself.
If you ever manage to reproduce it reliably, that is of course the absolute best, but from the sound of it that doesn't happen.
Do you know if anything else was running at the same time, e.g. Pacman? I opted to not use libalm, the Pacman libraries, because it was always annoying to have to recompile a tool like cower every time I updated pacman. But that means that I had to come up with the database reading myself, which isn't as battle-tested as that from Pacman.
Sadly it took me 40hrs to notice the packages were gone ๐
Thankfully one of the mirrors isn't syncing the packages deletes and I've been using it as a backup.
I had one entry showing as mixxx_beta-git: updated( -> r6814-1)
in repoctl status
.
This package wasn't even appearing in my dump with tar -tv --zstd -f chaotic-aur.db.tar.zst | awk '/^d/{print $6}'
. And it was built near the time things went crazy.
I've added it with repo-add
and now it's in the database and doesn't show in repoctl status
anymore...
Do you know if anything else was running at the same time, e.g. Pacman?
It shouldn't be running, for except inside some containers...
Over the years I've also noticed (and had reports) of local repository database suddenly becoming empty. Never found out the cause either. In my case, aur-build
does not seem to have data races either (all built packages are written to a random, private directory before being mv
'd to the local repository, and repo-add
has its own locking mechanism which I presume (?) to be functional).