Locked SED drives and linux errors
adambmedent opened this issue · 12 comments
Hey all I am using SED's in a server enviroment for encryption. I have a process in place to unlock the drives once the server boots up, however the locked drives seem to cause a number of issues on the linux server during boot.
Is there any easy or known way to ignore locked SED's during boot? Anyone else every run into this issue? Once the drives are unlocked all is well.
[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 Sense Key : Illegal Request [current]
[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 Add. Sense: Security conflict in translated device
[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
[Wed Dec 13 06:49:50 2023] I/O error, dev sdbo, sector 15002931712 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[Wed Dec 13 06:49:50 2023] Buffer I/O error on dev sdbo, logical block 1875366464, async page read
[Wed Dec 13 06:49:51 2023] Buffer I/O error on dev sdbo, logical block 1875366464, async page read
Well looks like I got into a worse situation testing.
Seems the one drive I was testing with is bricked in some way.
Ran the following, powered down the box and power it back up. Once it came back online I can no longer do anything with the drive.
sedutil-cli --initialsetup PASSWORD /dev/sddq
sedutil-cli --enablelockingrange 0 PASSWORD /dev/sddq
sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sddq
sedutil-cli --setmbrenable off PASSWORD /dev/sddq
root@BunkSnapVaultProx:/dev/disk/by-id# ls -ltrh | grep 4441
lrwxrwxrwx 1 root root 10 Dec 13 08:35 ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441 -> ../../sddq
root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sddq
Invalid or unsupported disk /dev/sddq
Bummer, seems sedutils bricked my drive in some way. Can't even see information via hdparm anymore.
root@BunkSnapVaultProx:/dev/disk/by-id# hdparm -I /dev/disk/by-id/ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441
/dev/disk/by-id/ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441:
SG_IO: bad/missing sense data, sb[]: 70 00 0b 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ATA device, with non-removable media
Standards:
Likely used: 1
Configuration:
Logical max current
cylinders 0 0
heads 0 0
sectors/track 0 0
--
Logical/Physical Sector size: 512 bytes
device size with M = 10241024: 0 MBytes
device size with M = 10001000: 0 MBytes
cache/buffer size = unknown
Capabilities:
IORDY not likely
Cannot perform double-word IO
R/W multiple sector transfer: not supported
DMA: not supported
PIO: pio0
Once in awhile ill see this.
root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sdbu
Properties exchange failed
/dev/sdbu ATA SAMSUNG MZ7L37T6HBLA-00A07 JXTC304Q S6EPNN0W504441
TPer function (0x0001)
ACKNAK = N, ASYNC = N. BufferManagement = N, comIDManagement = N, Streaming = Y, SYNC = Y
Locking function (0x0002)
Locked = Y, LockingEnabled = Y, LockingSupported = Y, MBRDone = N, MBREnabled = N, MBRAbsent = N, MediaEncrypt = Y
Geometry function (0x0003)
Align = Y, Alignment Granularity = 8 (4096), Logical Block size = 512, Lowest Aligned LBA = 0
DataStore function (0x0202)
Max Tables = 9, Max Size Tables = 10485760, Table size alignment = 1
OPAL 2.0 function (0x0203)
Base comID = 0x1004, Initial PIN = 0x00, Reverted PIN = 0x00, comIDs = 1
Locking Admins = 4, Locking Users = 9, Range Crossing = N
So I know the drive i locked.
Most of the time I see this.
root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sdbu
Invalid or unsupported disk /dev/sdbu
I have to be missing something major here.
Similar with this command as well.
root@BunkSnapVaultProx:# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu# sedutil-cli --setlockingrange 0 rw t MyPass /dev/sdbu
Invalid or unsupported disk /dev/sdbu
root@BunkSnapVaultProx:
Invalid or unsupported disk /dev/sdbu
root@BunkSnapVaultProx:# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu
Invalid or unsupported disk /dev/sdbu
root@BunkSnapVaultProx:
Properties exchange failed
unsigned int requested for token is unsupported
Testing with another drive, hit the same issue once the drive is locked. This time I manually locked the drive.
root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 ro PASSWORD /dev/sdcn
LockingRange0 set to RO
root@BunkSnapVaultProx:/dev/disk/by-id# mount /dev/sdcn /mnt
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C
This looks good, drive shouldn't mount when its RO.
root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sdcn
LockingRange0 set to RW
root@BunkSnapVaultProx:/dev/disk/by-id# mount /dev/sdcn /mnt
root@BunkSnapVaultProx:/dev/disk/by-id# df -h
Filesystem Size Used Avail Use% Mounted on
udev 189G 0 189G 0% /dev
tmpfs 38G 3.1M 38G 1% /run
/dev/mapper/pve-root 94G 26G 64G 29% /
tmpfs 189G 46M 189G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
efivarfs 192K 53K 135K 29% /sys/firmware/efi/efivars
/dev/sda2 511M 304K 511M 1% /boot/efi
/dev/fuse 128M 20K 128M 1% /etc/pve
tmpfs 38G 0 38G 0% /run/user/0
/dev/sdcn 7.0T 28K 6.6T 1% /mnt
Then I locked the drive and I am in the same position as the other drive.
root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 lk PASSWORD /dev/sdcn
LockingRange0 set to LK
root@BunkSnapVaultProx:~# sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sdcn
Properties exchange failed
unsigned int requested for token is unsupported
I am guessing my only option is the PSID?
I think this is the second or third issue I've seen here now where someone on Linux had issues with booting because of encrypted non-boot drives. You could read #449, I don't know if that will help you, sorry.
For the record, I think Windows might have problems with this too, I think the OSes just might not like seeing drives on boot they cannot access?
A PSID revert should at least bring the drive back (hopefully). You could also try revertnoerase
(see /wiki/Command-Syntax) to just disable locking.
I think this is the second or third issue I've seen here now where someone on Linux had issues with booting because of encrypted non-boot drives. You could read #449, I don't know if that will help you, sorry.
A PSID revert should at least bring the drive back (hopefully). You could also try
revertnoerase
(see /wiki/Command-Syntax) to just disable locking.
Appreciate the input. I did try the revertNoErase. That works aok as long as I don't power down the drive or put it into a lk state.
root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --revertNoErase PASSWORD /dev/sdcn
Invalid or unsupported disk /dev/sdcn
These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.
I had these working pretty well with hdparm, im not sure what in the world im missing with sedutils.
These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.
There's also reverttper
which uses your drive password I think (I've never tried it), it might have limitations compared to PSID revert, I don't know, or it works the same. But it might also not work if revertnoerase
doesn't work?
Reading #449 again, maybe the solution for that (or a variation) could work here, too. basically having MBREnable
on on boot (even though you don't need it because you're not booting from the drive) just so that the OS doesn't freak out about the drive because it can't access it. If that is the problem. shrug
These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.
There's also
reverttper
which uses your drive password I think (I've never tried it), it might have limitations compared to PSID revert, I don't know, or it works the same. But it might also not work ifrevertnoerase
doesn't work?Reading #449 again, maybe the solution for that (or a variation) could work here, too. basically having
MBREnable
on on boot (even though you don't need it because you're not booting from the drive) just so that the OS doesn't freak out about the drive because it can't access it. If that is the problem. shrug
I gave that one a try with no luck as well.
root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --revertTPer PASSWORD /dev/sdbu
Invalid or unsupported disk /dev/sdbu
Its almost as if once the disk is locked, its bricked. Im not even sold the psid is going to help here.
root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --yesIreallywanttoERASEALLmydatausingthePSID TEST /dev/sdbu
Invalid or unsupported disk /dev/sdbu
Obviously the psid isn't test, but I would expect a different failure output, like the one below, on a disk that hasn't been locked yet.
root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --yesIreallywanttoERASEALLmydatausingthePSID TEST /dev/sdbd
method status code NOT_AUTHORIZED
Session start failed rc = 1
EndSession Failed
You might have to try taking the drive(s) out and doing this on another system. I hope they're not really bricked and it's just something going wrong with the OS? Do you have an external adapter that supports OPAL commands (sadly not all USB ones work, apparently those with a SATA controller do, and any Thunderbolt ones should work I think? I also have an NVME-to-USB one that works)? Or just a different computer to plug them into?
If they're actually bricked then this is of course a giant issue with sedutil (and sorry for your loss, oof). Wish we could put that into the README like really big on top or something but this repo is basically dead in terms of development and nobody else has control over it.
You might have to try taking the drive(s) out and doing this on another system. I hope they're not really bricked and it's just something going wrong with the OS? Do you have an external adapter that supports OPAL commands (sadly not all USB ones work, apparently those with a SATA controller do, and any Thunderbolt ones should work I think? I also have an NVME-to-USB one that works)? Or just a different computer to plug them into?
If they're actually bricked then this is of course a giant issue with sedutil (and sorry for your loss, oof). Wish we could put that into the README like really big on top or something but this repo is basically dead in terms of development and nobody else has control over it.
Is there a live CD/DVD OS I could use? I do have access to the ipmi port to mount remote iso's etc.
No idea, sorry.
Can't you just try a Linux OS ISO, all the distros can boot as a live OS, right? And I don't know if Windows has anything like that these days. (Would be interesting to see if things are different on Windows, imo...)
However, if the issue is connected to the hardware in the system then I would assume that a live OS wouldn't help. But if you can't take the drives out then I guess that's all you can try? :/
Is there a live CD/DVD OS I could use? I do have access to the ipmi port to mount remote iso's etc.
Pretty much any Live Linux will do, once you set libata.allow_tpm=1
kernel parameter. You will need to copy sedutil-cli
binary there, as it is absent from most repositories.
There is also Seagate/TCGstorageAPI which might provide more useful information (requires building from source though). The front part is in Python, so it is easy to write scripts for debugging.
FWIW I ran into the same exact issue on the samsung drives MZ7L37T6HBLA
As soon as the drives are locked, scsi and I/O errors show up in dmesg
and the drives become unusable.
dmesg -w
mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
sd 4:0:1:0: Power-on or device reset occurred
Seems like udev
gets into a loop with the kernel's SCSI POWER_ON_RESET_OCCURRED
events. udev
tries to access locked device, drive is locked and generates a POWER_ON_RESET_OCCURRED
event, which causes udev
to try to access device again.
Can see events via:
sudo udevadm monitor -p
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
KERNEL[79934.766131] change /devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0
SUBSYSTEM=scsi
SDEV_UA=POWER_ON_RESET_OCCURRED
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SEQNUM=12593
and can see debug systemd-udevd
messages:
specifically Failed to run builtin 'blkid': Input/output error
$ sudo udevadm control --log-priority=debug
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: /usr/lib/udev/rules.d/60-persistent-storage.rules:109 Failed to run builtin 'blkid': Input/output error
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: /usr/lib/udev/rules.d/60-persistent-storage.rules:119 LINK 'disk/by-id/wwn-0x5002538f02c309dc'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Handling device node '/dev/sdb', devnum=b8:16
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/block/8:16' to '../sdb'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-SATA_SAMSUNG_MZ7L37T6_S6EPNA0TC03590'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/scsi-SATA_SAMSUNG_MZ7L37T6_S6EPNA0TC03590' to '../../sdb'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-path\x2fpci-0000:87:00.0-sas-exp0x500056b31054e8ff-phy1-lun-0'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-path/pci-0000:87:00.0-sas-exp0x500056b31054e8ff-phy1-lun-0' to '../../sdb'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-35002538f02c309dc'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/scsi-35002538f02c309dc' to '../../sdb'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fwwn-0x5002538f02c309dc'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/wwn-0x5002538f02c309dc' to '../../sdb'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: sd-device: Created db file '/run/udev/data/b8:16' for '/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-
4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0/block/sdb'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Adding watch on '/dev/sdb'
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: sd-device: Created db file '/run/udev/data/b8:16' for '/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-
4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0/block/sdb'
I couldn't figure out a proper way to fix. Ideally we'd want udev
to ignore locked devices somehow, or perhaps this is a drive firmware issue. As I have seen other devices still respond to blkid
command even when locked. These devices seem to generate SCSI POWER_ON_RESET_OCCURRED
for any command when locked.
The workaround is to stop udev
and unlock device:
$ sudo systemctl stop systemd-udevd systemd-udevd-kernel.socket systemd-udevd-control.socket