nix-rust/nix

Dir::open returns EBADF in new glibc

pxeger opened this issue · 4 comments

I tested this with nix 0.27.1:

use nix::dir::Dir;
use nix::fcntl::OFlag;
use nix::sys::stat::Mode;

fn main() {
    let dir = Dir::open("/etc", OFlag::O_DIRECTORY | OFlag::O_PATH, Mode::empty()).unwrap();
    println!("ok: {dir:?}");
}

Under glibc 2.39 (Arch Linux, 2.39-1), this code errors with

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: EBADF', src/main.rs:6:84
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

seemingly for no good reason!

No error is returned by the underlying system call:

...
openat(AT_FDCWD, "/etc", O_RDONLY|O_PATH|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
fcntl(3, F_GETFL)                       = 0x210000 (flags O_RDONLY|O_PATH|O_DIRECTORY)
close(3)                                = 0
...

and it doesn't happen when calling openat directly in C:

#define _GNU_SOURCE
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    int fd = open("/etc", O_RDONLY | O_PATH | O_DIRECTORY);
    if (fd < 0) return perror("open: "), 1;
    printf("%s: %d\n", "ok", fd);
    close(fd);
}

This prints

ok: 3

It also works fine with nix::fcntl::open instead of nix::dir::Dir::open.

This bug doesn't happen with Arch Linux's glibc 2.36-6; there it works fine and prints:

ok: Dir(0x6507c4e8daa0)

I haven't had time to bisect glibc versions any further, to establish what change has caused this bug, and whether it is nix's fault or glibc's.

I am on glibc 0.38, and that code works with no issue:

$ ldd --version
ldd (GNU libc) 2.38
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

$ cat src/main.rs
use nix::dir::Dir;
use nix::fcntl::OFlag;
use nix::sys::stat::Mode;

fn main() {
    let dir = Dir::open("/etc", OFlag::O_DIRECTORY | OFlag::O_PATH, Mode::empty()).unwrap();
    println!("ok: {dir:?}");
}

$ rg nix Cargo.toml
11:nix = { version = "0.27.1" , features = ["mman", "term", "fs", "process", "dir"]}

$ cargo r -q
ok: Dir(0x56529d85dba0)

I tried to run an Arch Linux container with podman, but the glibc is also 2.38:

$ podman pull archlinux

$ podman run -it -d archlinux
98eec04dc5d6f2de27f817ffec2c78600351d2c2adebce3faeba7404dd84e7ba

$ podman exec -it 98 bash

[root@98eec04dc5d6 /]# cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
VERSION_ID=20240101.0.204074
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo

[root@98eec04dc5d6 /]# ldd --version
ldd (GNU libc) 2.38
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

Since I do not have an environment with glibc 2.39, can I trouble you to investigate where that EBADF origins from, the implementation of Dir::open() is quite simple:

    pub fn open<P: ?Sized + NixPath>(
        path: &P,
        oflag: OFlag,
        mode: sys::stat::Mode,
    ) -> Result<Self> {
        let fd = fcntl::open(path, oflag, mode)?;  // openat(2)
        Dir::from_fd(fd)                           // fdopendir(3)
    }

That errno can come from either openat(2) or fdopendir(3), after we know where it comes from, we can take a look at the code diff between glibc 2.38 and 2.39, then we could possibly solve the issue.

Though from your strace output, openat(2) returns successfully...

It indeed comes from fopendir, and that error occurs when using that in C as well:

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    int fd = open("/etc", O_RDONLY | O_PATH | O_DIRECTORY);
    if (fd < 0) return perror("open"), 1;
    printf("%s: %d\n", "ok", fd);
    DIR *d = fdopendir(fd);
    if (d == 0) return perror("fdopendir"), 1;
    printf("%s: %d\n", "ok", fd);
    closedir(d);
}
fdopendir: Bad file descriptor

So I guess it's just a bug in glibc.

(FYI you can get glibc 2.39 in the container by running pacman -Syu)

In fact it's a bug fix it glibc, which means it was my code that's buggy - fdopendir is not designed to be used with O_PATH. Sorry for the needless report!