inconsistent/incomplete /dev/disk/by-id links
sharkcz opened this issue · 14 comments
We are experiencing a situation where the /dev/disk/by-id/...
symlinks are inconsistent across reboots. Sometimes links for all disks/dasds are present, sometimes only a (different) subset is present.
[root@openshift-8 ~]# ll /dev/disk/by-id/
total 0
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-0X5422-part1 -> ../../dasda1
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-0X5422-part2 -> ../../dasda2
lrwxrwxrwx. 1 root root 11 Dec 7 10:19 ccw-0X5622 -> ../../dasdc
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-0X5622-part1 -> ../../dasdc1
lrwxrwxrwx. 1 root root 11 Dec 7 10:19 ccw-0X5722 -> ../../dasdd
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-0X5722-part1 -> ../../dasdd1
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-IBM.750000000FRB71.0230.22.00000000000027200000000000000000-part1 -> ../../dasda1
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-IBM.750000000FRB71.0230.22.00000000000027200000000000000000-part2 -> ../../dasda2
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-IBM.750000000FRB71.0230.22-part1 -> ../../dasda1
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-IBM.750000000FRB71.0230.22-part2 -> ../../dasda2
lrwxrwxrwx. 1 root root 11 Dec 7 10:19 ccw-IBM.750000000FRB71.0232.22 -> ../../dasdc
lrwxrwxrwx. 1 root root 11 Dec 7 10:19 ccw-IBM.750000000FRB71.0232.22.00000000000027200000000000000000 -> ../../dasdc
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-IBM.750000000FRB71.0232.22.00000000000027200000000000000000-part1 -> ../../dasdc1
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-IBM.750000000FRB71.0232.22-part1 -> ../../dasdc1
lrwxrwxrwx. 1 root root 11 Dec 7 10:19 ccw-IBM.750000000FRB71.0233.22 -> ../../dasdd
lrwxrwxrwx. 1 root root 11 Dec 7 10:19 ccw-IBM.750000000FRB71.0233.22.00000000000027200000000000000000 -> ../../dasdd
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-IBM.750000000FRB71.0233.22.00000000000027200000000000000000-part1 -> ../../dasdd1
lrwxrwxrwx. 1 root root 12 Dec 7 10:19 ccw-IBM.750000000FRB71.0233.22-part1 -> ../../dasdd1
[root@openshift-8 ~]# ll /dev/disk/by-id/
total 0
lrwxrwxrwx. 1 root root 11 Dec 7 10:48 ccw-0X5722 -> ../../dasdd
lrwxrwxrwx. 1 root root 12 Dec 7 10:48 ccw-0X5722-part1 -> ../../dasdd1
lrwxrwxrwx. 1 root root 11 Dec 7 10:48 ccw-IBM.750000000FRB71.0233.22 -> ../../dasdd
lrwxrwxrwx. 1 root root 11 Dec 7 10:48 ccw-IBM.750000000FRB71.0233.22.00000000000027200000000000000000 -> ../../dasdd
lrwxrwxrwx. 1 root root 12 Dec 7 10:48 ccw-IBM.750000000FRB71.0233.22.00000000000027200000000000000000-part1 -> ../../dasdd1
lrwxrwxrwx. 1 root root 12 Dec 7 10:48 ccw-IBM.750000000FRB71.0233.22-part1 -> ../../dasdd1
environment is Fedora 35 with kernel-5.14.18-300.fc35.s390x and s390utils-core-2.17.0-2.fc35.s390x (version shouldn't matter much as etc/udev/rules.d/59-dasd.rules
hasn't changed for long time, except the scheduler setting)
Fedora 35 with kernel-5.15.6-200.fc35.s390x doesn't seem to have the /dev/disk.by-id
directory at all, looking further ...
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1963192
hmm, now I understand it even less, booting with kernel 5.14.18 with rd.udev.debug
the journal is full of LINK
messages for the by-id
symlinks from the 59-dasd
rules file, but nothing is there, not even the /dev/disk/by-id
directory ...
I wonder if there is a race condition between creating the actual symlinks and creating the /dev/disk/by-id/
directory ...
I think messages like this explain the missing symlinks
...
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: No reference left for '/dev/disk/by-id/ccw-IBM.750000000FRB71.0233.22.00000000000027200000000000000000', removing
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: Updating old device symlink '/dev/disk/by-id/ccw-0X5722', which is no longer belonging to this device.
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: No reference left for '/dev/disk/by-id/ccw-0X5722', removing
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: Updating old device symlink '/dev/disk/by-id/ccw-IBM.750000000FRB71.0233.22', which is no longer belonging to this device.
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: No reference left for '/dev/disk/by-id/ccw-IBM.750000000FRB71.0233.22', removing
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: Updating old device symlink '/dev/disk/by-id/ccw-IBM.750000000FRB71.0233.22.00000000000027200000000000000000', which is no longer belonging to this device.
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: No reference left for '/dev/disk/by-id/ccw-IBM.750000000FRB71.0233.22.00000000000027200000000000000000', removing
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: Updating old device symlink '/dev/disk/by-id/ccw-0X5722', which is no longer belonging to this device.
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: No reference left for '/dev/disk/by-id/ccw-0X5722', removing
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: Updating old device symlink '/dev/disk/by-id/ccw-IBM.750000000FRB71.0233.22', which is no longer belonging to this device.
Dec 10 10:48:30 openshift-8.s390.bos.redhat.com systemd-udevd[420]: dasdd: No reference left for '/dev/disk/by-id/ccw-IBM.750000000FRB71.0233.22', removing
ping me for a full log
I'm currently unable to access the RedHat BZ. I need to get an account first.
Can you share more details about your setup? How are the DASDs configured? Did you use chzdev -e
to have a persistent
configuration? Is it always the same DASDs that have this issue or is it random?
I've tried several reboots on an LPAR with 10 DASDs persistently configured using chzdev -e
on a freshly installed
F35 (tried both 5.15.6-200.fc35.s390x
and 5.14.10-300.fc35.s390x
+ s390utils-2:2.17.0-2.fc35.s390x
) but
wasn't able to reproduce the issue so far.
my environment is
- z13 with z/VM 6.4.0
- the guest is Fedora 35 with kernel-5.14.18-300.fc35.s390x and systemd-udev-249.7-2.fc35.s390x
- DASDs configured with rd.dasd= and /etc/dasd.conf
- what links are present/missing is purely random
The original report is from OCP/RHEL-8.x with z/VM 7.2.0 on z13 and z15.
I suspect there might be something wrong with udev or kernel handling the devices, rather than the udev rules in s390utils which are pretty straightforward.
20220105-1028-udev.log.zip
created with
journalctl -b | grep systemd-udev > 20220105-1028-udev.log
ll /dev/disk/by-id/ >> 20220105-1028-udev.log
lsdasd
I was able to reproduce the issue myself now with:
- LPAR z14
- F35 5.14.10-300.fc35.s390x and systemd-udev-0:249.7-2.fc35.s390x
- DASDs configured with /etc/dasd.conf
I didn't see the problem when using chzdev to persistently configuring the devices. Maybe you can give it a try
to see how this behaves on your setup. Make sure to remove all DASD from /etc/dasd.conf and then enable them
via chzdev -e <devices>
(you can specify a range here as well, e.g. 9300-930f).
I'll dig a bit deeper to see where the problem might be.
Have you also removed the rd.dasd=
definitions from the kernel parameter line and used the zdev "rootfs mode" purely?
Right now I am testing with /etc/dasd.conf
completely removed (both from system and from initrd) and still no 100% success (4x all links created, 1x no links at all, 1x links for dasdc1
only).
I believe using the zdev persistent config doesn't matter. I have converted my system fully to zdev for dasds and I am still getting random result with the "by-id" links. I suspect the problem is deeper in udev or kernel.