problems with eudev in initramfs and userspace
fluxer opened this issue · 20 comments
Using eudev in both initramfs and userspace here. All was fine until I upgraded to 2.1.1 from 1.10 and starting udevd from the initscripts just hangs the system. I've tried to find the reason and seems to be some sort of conflict between udevd from the initramfs and udevd from userspace because when I try to start it from userspace with --debug
it pops some errors about failing to bind address which I later googled for and found similar reports such as https://bugs.launchpad.net/ubuntu/+source/udev/+bug/787610 or https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=624469. I'm not using neither Ubuntu or Debian.
Except for the eudev upgrade I have not done anything else that would cause this behaviour so it's deffinetly some change between 1.10 and 2.1.1. I do noticed during the configure step that --with-firmware-path
is invalid now, so my guess here is that firmware loading from userspace is no longer supported and the alternative is kernel build with CONFIG_FIRMWARE_IN_KERNEL=y but that is the case for the kernel I use.
Here is the kernel config: http://pastebin.com/nua9R7V7 (it will not expire). eudev was configured like this:
./configure --prefix=/ \
--datadir=/share \
--sysconfdir=/etc \
--localstatedir=/var/lib \
--with-rootprefix= \
--libdir=/lib \
--with-firmware-path="/lib/firmware/updates:/lib/firmware" \
--with-rootlibdir=/lib \
--exec-prefix=/ \
--enable-gudev \
--disable-manpages \
--disable-keymap \
--disable-static
# --enable-libkmod
The hook I use for eudev in initramfs:
msg "Linking modprobe..."
ln -s /bin/modprobe /sbin/modprobe
msg "Starting UDev daemon..."
udevd --daemon --resolve-names=never
msg "Triggering uevents..."
udevadm trigger --action=add --type=subsystems
udevadm trigger --action=add --type=devices
udevadm settle
msg "Exiting UDev daemon..."
udevadm control --exit
udevadm info --cleanup-db
And udevd called like this in initscripts:
udevd --daemon
udevadm trigger --action=add --type=subsystems
udevadm trigger --action=add --type=devices
Once udevd --daemon
is reached I can do nothing but reboot it. For now as temporary solution I've just commented out the lines mentioned above and the system (thankfully) is usable. Cheers!
Actually, the issue is not what I initially thought it is. It seems that attempting to capture the output of udevd will block, something that was not the case before. For an example: output=$(udevd --daemon --debug)
. Obviously, --daemon
has no effect. I'm aware that trying to capture the output of process that forks may not end up as expected (no output will be captured beyong the fork) but it's a function that does it for all relevant command in initscripts.
Yeah, capturing output from a daemon is wrong. Can't you use the logs?
I'm unaware of udev logging somewhere but in any case that's not exactly what my initscripts try to do (printing individual programs logs), they try to capture stdout as well as stderr and print every line of it with prefix so it becomes something like this:
* Starting UDev daemon
>> bind failed: Address already in use
>> error binding udev control socket
The star is supposed to be colored either dark blue or dark red depending on the exit status of the command and the prefix is to be yellow (not that it matters much for this issue). eudev's udevd seems to be the only program that can not be wrapped in a way as descriped above (in a sub-shell) that I know of, at least in recent versions.
Sorry for the delay. (real life). udevd writes to stderr which you cannot capture with output=$(...). Rather, you need to do something like output=$(udevd --daemon --debug 2>&1). To convince yourself, play with this program:
#include <stdio.h>
int main() {
printf("I'm stdout.\n");
fprintf(stderr, "I'm stderr.\n");
return 0;
}
The run
$ output=$(./test)
I'm stderr.
$ echo $output
I'm stdout.
$ output=$(./test 2>&1)
$ echo $output
I'm stderr.
I'm stdout.
Let me know if this solves your problem.
No, it does not. In my initscripts both stdout and stderr are captured and the problem is not this. The problem is that if an attempt to capture stdout of udevd is made, --daemon
has no effect and the process does not fork.
Ah okay. Is this a changed behavior from the previous release?
Yes, that was not the case with 1.10 but is with 2.1.1.
Because of time constraints, I haven't been able to trace down the commit. If you can find where the issue started, I'll see what I can do.
OK, I will start testing different releases (between 1.10 and 2.1.1) and then attempt to bisecting the commit. On a second thought, maybe I should give a shot to 3.0/3.1 first?
You can. There was some upstream daemon rewriting.
@fluxer Thanks! Let me think about how to deal with this because we really don't want a daemon spewing output all over the place. I'm thinking about restructuring the flags so that this is an option for you. Would that work?
@blueness Take a look at https://github.com/gentoo/eudev/blob/master/src/udev/udevd.c#L1194, please. The descriptors are not closed, only duplicated. I propose the following patch:
diff --git a/src/udev/udevd.c b/src/udev/udevd.c
index 6cfb2bc..3ed49a1 100644
--- a/src/udev/udevd.c
+++ b/src/udev/udevd.c
@@ -1193,8 +1193,10 @@ int main(int argc, char *argv[]) {
if (fd >= 0) {
if (write(STDOUT_FILENO, 0, 0) < 0)
dup2(fd, STDOUT_FILENO);
+ close(STDOUT_FILENO);
if (write(STDERR_FILENO, 0, 0) < 0)
dup2(fd, STDERR_FILENO);
+ close(STDERR_FILENO);
if (fd > STDERR_FILENO)
close(fd);
} else {
I can open a pull request if you want.
@fluxer stdout/stderr are sent to /dev/null; closing them is generally a bad idea because many programs assume they are open on startup (think of udev rule helpers).
@floppym I knew I was missing something, it didn't made sense to me at first when I saw the code because daemonizing requires closing the descriptors in use.
Reading docs about dup2 doesn't help much as some state that dup2 does not close the original descriptor and some state that it does indeed close (in some cases), in what conditions the second applies I have no idea but I have not dig deep. If dup2 closes the descriptors in some cases then the helpers may fail anyway, and that would be the case when udevd has forked when the example command that shows --daemon
has no effect is not used (basicly, all other cases).
Maybe a patch to just close stdout/stderr and open /dev/null as stdout/stderr after the daemonization should make the forking work as expected and the helpers happy?
I don't know what documentation you are reading, but dup2 always closes any existing file descriptor before replacing it. In fact, this is guaranteed to happen atomically.
From dup2(2)
:
int dup2(int oldfd, int newfd);
...
The dup2() system call performs the same task as dup(), but instead of
using the lowest-numbered unused file descriptor, it uses the descrip‐
tor number specified in newfd. If the descriptor newfd was previously
open, it is silently closed before being reused.
The steps of closing and reusing the file descriptor newfd are per‐
formed atomically. This is important, because trying to implement
equivalent functionality using close(2) and dup() would be subject to
race conditions, whereby newfd might be reused between the two steps.
Such reuse could happen because the main program is interrupted by a
signal handler that allocates a file descriptor, or because a parallel
thread allocates a file descriptor.
And from the same man page:
If oldfd is a valid file descriptor, and newfd has the same value as oldfd, then dup2() does nothing, and returns newfd.
I was cross-referencing with this one: http://pubs.opengroup.org/onlinepubs/009695399/functions/dup.html which states:
If fildes is a valid file descriptor and is equal to fildes2, dup2() shall return fildes2 without closing it.
But anyway, I'll leave the patching to you.
For the next release, a simple revert of our b2399d9 or upstream's 5c67cf2, is sufficient. In the future I may make use of the code from terminal-util.c as they do in upstream's 40e749b.
Please test and let me know if this works for you. I'll be pushing out 3.1.2 in a few days.
It does, much appreciated.
Cheers!
@fluxer: Just to bring that dup2 conversion to a close:
oldfd is the file descriptor we get from open("/dev/null", O_RDWR)
, which is guaranteed to be an unused file descriptor. newfd is either STDOUT_FILENO (1) or STDERR_FILENO (2). In theory, oldfd could be either 1 or 2 if stdout or stderr were already closed when we called open
; in that case, there is no need close them again!
So basically, that caveat in the manual does not apply to this situation.