Yelp/dumb-init

1.2.1 FTBFS on mips (tests failures)

onlyjob opened this issue · 5 comments

Thanks for reporting this! My guess is that this is a test flake (or similar) since there's barely any code changes between 1.2.0 and 1.2.1 (just a one-line bugfix), but it's hard to say for sure. Unfortunately I'm not really sure how to test this without access to a mips machine (looks like maybe QEMU could work?).

Is it possible to rerun the build on mips* to check if it's failing consistently or just flaky?

I don't have access to any mips hardware... I think tests fail consistently as all mips architectures are affected (not just one or any two)...

1.2.2 is even worse, it fails on armel and armhf as well: https://buildd.debian.org/status/package.php?p=dumb-init&suite=sid

That's really odd... I was almost certain 1.2.2 would improve things since it includes #174 which was causing some test flakiness for us, especially in some environments like Travis.

Linking in a specific build log: https://buildd.debian.org/status/fetch.php?pkg=dumb-init&arch=armel&ver=1.2.2-1&stamp=1548572571&raw=0 (not sure if these logs are kept forever, so I've also backed it up here).

The captured stderr for the failed test is kind of interesting:


[dumb-init] Running in debug mode.
[dumb-init] Unable to detach from controlling tty (errno=25 Inappropriate ioctl for device).
[dumb-init] Child spawned with PID 9146.
[dumb-init] Unable to attach to controlling tty (errno=25 Inappropriate ioctl for device).
[dumb-init] setsid complete.
[dumb-init] Received signal 17.
[dumb-init] A child with PID 9146 exited with exit status 0.
[dumb-init] Forwarded signal 15 to children.
[dumb-init] Child exited with status 0. Goodbye.

Both of the tracebacks look like this:

        # read a line from print_signals, figure out its pid
        line = proc.stdout.readline()
        match = re.match(b'ready \(pid: ([0-9]+)\)\n', line)
>       assert match, line
E       AssertionError: 
E       assert None

Weird that there's an empty line that gets read, I wonder where that comes from...

Ultimately I don't know how we're going to try to fix this without being able to reproduce it locally, there's just not enough information in the log. Maybe I can try to rig something up with QEMU so that we can test on these architectures.

It looks like dumb-init is building everywhere now:
https://buildd.debian.org/status/package.php?p=dumb-init&suite=unstable

I suspect this patch helped: https://sources.debian.org/src/dumb-init/1.2.2-1.1/debian/patches/increase-test-sleep-time.patch/

(We also applied that here in 0708f27 so that patch can be dropped on the Debian side after the next release.)

Closing this out as I think this is fixed, let me know if I'm misunderstanding!