linux-x86: Most git operations fail with "failed to stat <some file>"
senekor opened this issue · 38 comments
Update/TLDR
It looks like the issue is that cargo binstall jj-cli
installs a broken binary from QuickInstall for jj 0.18 on linux x86 systems.
Fix:
The problem is solved on recent versions of cargo-binstall
. Make sure you're running the latest one before reinstalling jj to solve the problem:
cargo binstall cargo-binstall
cargo binstall --force jj-cli
We'd appreciate a confirmation that the fix works for people.
Alternatively, or if you want to be doubly sure, you can run
cargo binstall --strategies crate-meta-data --force jj-cli
Description
I just updated to 0.18 and most jj commands fail (some are fine, e.g. log
) with similar error messages:
$ jj new x
Error: Git operation failed
Caused by: failed to stat '/home/senk/.gitconfig'; class=Config (7)
(I do have a git config, it just lives in ~/.config/git/config
.)
If I touch that file, the process repeats with ./.git/info/grafts
and ./.git/shallow
.
Finally, git operations of jj seemingly work again (at least jj new
).
However, using the git cli itself I get a warning about .git/info/grafts
being deprecated.
So that doesn't seem like a good solution to just make these dummy files.
Steps to Reproduce the Problem
This is the second computer I updated and the first didn't show this problem.
So I don't know how to reproduce it yet.
Expected Behavior
Git operations should succeed as normal.
Actual Behavior
$ jj new x
Error: Git operation failed
Caused by: failed to stat '/home/senk/.gitconfig'; class=Config (7)
Specifications
- Platform: Linux
- Version:
6.8.11-300.fc40.x86_64
This happens in all repositories, even when I do a fresh jj git init --colocate
in a previously only git repo.
It does NOT happen in a repo that isn't colocated. Only noticing it now because I default to colocated repos.
Confirming I'm seeing this in my collocated repos as well with the upgrade to 0.18.0
EDIT: I can't even fresh clone a repo actually, it's not even a matter of colocation.
EDIT2: Even if I add an empty grafts and shallow file other operations like git fetch fail with more "failed to stat"
❯ jj git fetch
Error: failed to stat '/home/michael/dotfiles/.jj/repo/store/git/objects/pack/pack_git2_4978ca90985didx': Resource temporarily unavailable; class=Os (2)
I tired to reproduce this briefly on a Mac, but haven't been able to yet. I can also try Linux later. So, I'm totally guessing in the dark.
It might be worth double-checking that it's not some sort of a temporary problem with the /home filesystem. Do the files it's complaining about exist? Would rebooting the computer make a difference? Do plain git
commands work in that repo?
It might also be some sort of incompatibility between a libgit2 and/or gix version and/or some git version. Which version of Git do you use?
It would also be very helpful if one if you can build jj from source and bisect the v0.17.0..v0.18.0 range.
❯ jj git fetch Error: failed to stat '/home/michael/dotfiles/.jj/repo/store/git/objects/pack/pack_git2_4978ca90985didx': Resource temporarily unavailable; class=Os (2)
Seems like underlying libgit2 code paths are totally broken. It's weird that fstat returned EAGAIN. Running with strace
might show something wrong.
Btw, which Linux distribution and version are you using?
I'm running Fedora 40. The thing is that my personal computer doesn't have this issue, only my work computer. (Same setup for all intents and purposes. Obv. not the exact same hardware.)
I'll find some time later today to go to the office and do the bisect.
Well, installing from source works perfectly fine 😕 I'll try to reproduce the CI build more closely, starting with using musl instead of glibc.
This is quite strange. I built the v0.18.0
tag in an Ubuntu container with the musl target... and it works fine. If I cargo-binstall it, the problem is back. At this point, I don't know what to try anymore to reproduce it. It seems almost like something went wrong with the build, rather than the source.
Have you been using cargo install --locked
or cargo build
to build it, as opposed to a plain cargo install
?
If so, another remote possibility (reaching at straws here) is that perhaps the binary uses some CPU instruction that your processor doesn't support (got optimized for Intel-only instructions or AMD-only or ??).
Also, the release binaries are built with Rust 1.76. I'd be surprised if this made a difference...
I used cargo build
. I don't really know about the nuances of what those different commands do. Let me know if I should try something else.
My home pc has an amd cpu, the office one is intel (11900k I think?).
I'm using the latest rust version (1.78).
cargo build
should use Cargo.lock
and therefore the same dependencies as the release build.
If you installed Rust with rustup, you could relatively easily install 1.76 toolchain (rustup toolchain add 1.76
, I believe, and then cargo +1.76 build
).
I guess it's possible that the binary doesn't work correctly on Intel CPUs. We can ask on Discord, that's probably the easiest way to find out.
Shoot, looks like I made a mistake. Apparantly I had a self-compiled version of jj installed on my personal machine all along. Now that I installed the precompiled binary here as well, I can at least reproduce at home. So Intel / AMD cpu doesn't seem to be the issue.
I'll try to compile myself with Rust 1.76 in a minute.
Compiling with 1.76 also doesn't cause any issues, jj works fine.
I had another idea that it might be the feature flags. I'm pretty sure I had them set just like in the CI workflow when testing early at the office. But I'll try again. oppenssl always gives me trouble with Rust builds 😢
Nope, the features don't make a difference either. Still at square one: Local builds are fine, prebuilt ones are borked.
What the... if I download the prebuilt binary manually, it seems to be fine as well. Only cargo binstall
causes issues! Now I have to figure out how and why I'm getting a different thing from binstall.
With the following command, the correct binary seems to be installed:
cargo binstall jj-cli --target x86_64-unknown-linux-musl
# ...
WARN The package jj-cli v0.18.0 (x86_64-unknown-linux-musl) has been downloaded from github.com
# ...
Notice the snippet of the output (github.com). This is different without the explicit target:
cargo binstall jj-cli
# ...
WARN The package jj-cli v0.18.0 (x86_64-unknown-linux-gnu) has been downloaded from third-party source QuickInstall
# ...
So, QuickInstall. It looks like they also provide prebuilt releases for some rust programs. Presumably, the borked binaries are downloaded from this release:
https://github.com/cargo-bins/cargo-quickinstall/releases/tag/jj-cli-0.18.0
I have no idea how the QuickInstall system works. But it seems they only provide binaries for x86_64-unknown-linux-gnu (the one we don't provide).
Would there be any downside to providing gnu and musl binaries for linux? It would probably be even easier than adding aarch64 binaries. It's a little redundant since musl works fine everywhere, but it's an easy solution.
I'll investigate the QuickInstall system a bit to maybe figure out what went wrong there.
I tried "pinning" this issue and putting the workaround in the first comment.
Great! The flag --strategies crate-meta-data
works for me.
The quick-install version is built using glibc 2.17 to maintain backwards compatibility with centos 7, perhaps that's the reason why it failed?
I could:
- set glibc version higher for jj on quick-install
- enable gix on quick-install
to workaround this issue
@NobodyXu good questions, thank you for looking into it! Unfortunately, I don't know about glibc.
Re enabling gix, this will not get rid of the libgit2 dependency. Pushing and pulling is only implemented with libgit2 in jj
currently.
Pushing and pulling is only implemented with libgit2 in jj currently.
Not sure about pushing, but pulling using gix is definitely doable.
The binary downloaded from https://github.com/cargo-bins/cargo-quickinstall/releases/tag/jj-cli-0.18.0 appears to link some of the libc functions statically.
For example, git_config_add_file_ondisk
calls statically-linked stat64
->fstatat64
. OTOH, it refers to dynamically-linked errno
.
(gdb) disas git_config_add_file_ondisk
Dump of assembler code for function git_config_add_file_ondisk:
...
0x0000000000ee4aee <+46>: mov rbp,rdi
0x0000000000ee4af1 <+49>: lea rsi,[rsp+0x10]
0x0000000000ee4af6 <+54>: mov rdi,rbx
0x0000000000ee4af9 <+57>: call 0x1274b70 <stat64>
0x0000000000ee4afe <+62>: test eax,eax
0x0000000000ee4b00 <+64>: js 0xee4b8e <git_config_add_file_ondisk+206>
...
0x0000000000ee4b8e <+206>: call 0x1274ca0 <__errno_location@plt>
0x0000000000ee4b93 <+211>: mov eax,DWORD PTR [rax]
0x0000000000ee4b95 <+213>: cmp eax,0x2
0x0000000000ee4b98 <+216>: je 0xee4b06 <git_config_add_file_ondisk+70>
0x0000000000ee4b9e <+222>: cmp eax,0x14
0x0000000000ee4ba1 <+225>: je 0xee4b06 <git_config_add_file_ondisk+70>
0x0000000000ee4ba7 <+231>: lea rsi,[rip+0xffffffffff399803] # 0x27e3b1
0x0000000000ee4bae <+238>: mov edi,0x7
0x0000000000ee4bb3 <+243>: mov rdx,rbx
0x0000000000ee4bb6 <+246>: xor eax,eax
0x0000000000ee4bb8 <+248>: call 0xeed080 <git_error_set>
0x0000000000ee4bbd <+253>: jmp 0xee4b79 <git_config_add_file_ondisk+185>
(gdb) disas fstatat64
Dump of assembler code for function fstatat64:
0x0000000001274bf0 <+0>: push rbp
0x0000000001274bf1 <+1>: mov rbp,rsp
0x0000000001274bf4 <+4>: mov eax,0x106
0x0000000001274bf9 <+9>: mov r10d,ecx
0x0000000001274bfc <+12>: syscall
0x0000000001274bfe <+14>: xor ecx,ecx
0x0000000001274c00 <+16>: cmp eax,0xfffff001
0x0000000001274c05 <+21>: jb 0x1274c18 <fstatat64+40>
0x0000000001274c07 <+23>: neg eax
0x0000000001274c09 <+25>: mov rcx,0xfffffffffffffff8
0x0000000001274c10 <+32>: mov DWORD PTR fs:[rcx],eax
0x0000000001274c13 <+35>: mov ecx,0xffffffff
0x0000000001274c18 <+40>: mov eax,ecx
0x0000000001274c1a <+42>: pop rbp
0x0000000001274c1b <+43>: ret
(errno = -ret
is inlined?)
I think that's the source of the problem.
Quickinstall uses cargo-zigbuild to cross compile to glibc 2.17, perhaps that's the souce of the error?
Like @senekor I've confirmed that if I install via cargo install jj-cli --locked
this problem does not exist. My initial problem was with cargo binstall
which will install the binaries from the release and if there are none default to cargo install
Two questions:
- If we created our own dynamically linked glibc builds, and published them in a release, would
cargo binstall
use those instead of the upstreamzig
version that's apparently broken? - Alternatively, is it possible to get
cargo binstall
to use the musl binary in the meantime? Or permanently, on all systems?
I ask because it would be nice of us to mitigate this for our users ASAP, so they don't run into it. We can/should also update our docs I suppose in the meantime, but I think it's important to stop the bleeding, so to speak.
In either case, I would prefer if cargo binstall
would only install our binaries from our release archives by default, because that's what we CI and test against, and it would be really nice if that could work OOTB without any extra flags — so that users can get working results even if they mistype something or just run cargo binstall
blindly without looking too hard.
If we created our own dynamically linked glibc builds, and published them in a release, would cargo binstall use those instead of the upstream zig version that's apparently broken?
I believe so. Of course, then it would be us to make a glibc-linked binary that works on many linux systems, which I don't have much experience with, but you might.
Alternatively, is it possible to get cargo binstall to use the musl binary in the meantime? Or permanently, on all systems?
In either case, I would prefer if cargo binstall would only install our binaries from our release archibes by default, because that's what we CI and test against, and it would be really nice if that could work OOTB without any extra flags — so that users can get working results even if they mistype something or just run cargo binstall blindly without looking too hard.
AFAIU, there's no way currently, but this is now tracked in cargo-bins/cargo-binstall#1721.
A possible workaround is to lie and make a copy of https://github.com/martinvonz/jj/releases/download/v0.18.0/jj-v0.18.0-x86_64-unknown-linux-musl.tar.gz named https://github.com/martinvonz/jj/releases/download/v0.18.0/jj-v0.18.0-x86_64-unknown-linux-gnu.tar.gz.
I'd rather not, though.
If we created our own dynamically linked glibc builds, and published them in a release, would cargo binstall use those instead of the upstream zig version that's apparently broken?
Yes.
Alternatively, is it possible to get cargo binstall to use the musl binary in the meantime? Or permanently, on all systems?
You can use --disable-strategies quick-install
Alternatively, is it possible to get cargo binstall to use the musl binary in the meantime? Or permanently, on all systems?
You can use
--disable-strategies quick-install
I think the question is whether we can do this from the project's side. Asking every user to do this manually isn't reliable. I'm personally so used to binstall working well that I often don't check a projects installation documentation and just binstall it.
I do wonder: Can't you disable building jj and delete the broken release on the quickinstall side? Is it setup in a way that this is impossible or difficult?
We actually have recommended the correct command in https://martinvonz.github.io/jj/latest/install-and-setup/#cargo-binstall for a long time, but I expect binstall
users are less likely to read that page than an average user.
I think the question is whether we can do this from the project's side. Asking every user to do this manually isn't reliable. I'm personally so used to binstall working well that I often don't check a projects installation documentation and just binstall it.
It's tracked in cargo-bins/cargo-binstall#1721 , though I'm quite busy recently.
I do wonder: Can't you disable building jj and delete the broken release on the quickinstall side? Is it setup in a way that this is impossible or difficult?
Yes, I can remove the broken release, though it might be built again.
I think it might be a bug in rust-cross/cargo-zigbuild#255 or ziglang, since cargo-quickinstall use it to build for glibc 2.17
Now that this PR is merged, the problem should go away with the next release of cargo-binstall
. Of course, people will be running outdated versions for some time, so we might want to keep this issue pinned for a little longer.
cargo-binstall 1.7.1 has released, cargo-bins/cargo-binstall#1741 which fixed this issue is released in 1.7.0
Issue can be closed. Failure was due to combination of glibc target below 2.33 and compilation via Zig prior to 0.12.0
.
(
errno = -ret
is inlined?)I think that's the source of the problem.
It would be fine with glibc 2.33+ where these calls became proper functions AFAIK (see the referenced quotes in link below).
Prior to glibc 2.32 the problem is Zig specific, which they resolved in 0.12.0
👍
Details can be found here: rust-cross/cargo-zigbuild#255 (comment)