Drive-Trust-Alliance/sedutil

Locked SED drives and linux errors

adambmedent opened this issue · 12 comments

Hey all I am using SED's in a server enviroment for encryption. I have a process in place to unlock the drives once the server boots up, however the locked drives seem to cause a number of issues on the linux server during boot.

Is there any easy or known way to ignore locked SED's during boot? Anyone else every run into this issue? Once the drives are unlocked all is well.

[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 Sense Key : Illegal Request [current]
[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 Add. Sense: Security conflict in translated device
[Wed Dec 13 06:49:50 2023] sd 8:0:1:0: [sday] tag#1059 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
[Wed Dec 13 06:49:50 2023] I/O error, dev sdbo, sector 15002931712 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[Wed Dec 13 06:49:50 2023] Buffer I/O error on dev sdbo, logical block 1875366464, async page read
[Wed Dec 13 06:49:51 2023] Buffer I/O error on dev sdbo, logical block 1875366464, async page read

Well looks like I got into a worse situation testing.

Seems the one drive I was testing with is bricked in some way.

Ran the following, powered down the box and power it back up. Once it came back online I can no longer do anything with the drive.
sedutil-cli --initialsetup PASSWORD /dev/sddq
sedutil-cli --enablelockingrange 0 PASSWORD /dev/sddq
sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sddq
sedutil-cli --setmbrenable off PASSWORD /dev/sddq

root@BunkSnapVaultProx:/dev/disk/by-id# ls -ltrh | grep 4441
lrwxrwxrwx 1 root root 10 Dec 13 08:35 ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441 -> ../../sddq

root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sddq
Invalid or unsupported disk /dev/sddq

Bummer, seems sedutils bricked my drive in some way. Can't even see information via hdparm anymore.

root@BunkSnapVaultProx:/dev/disk/by-id# hdparm -I /dev/disk/by-id/ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441

/dev/disk/by-id/ata-SAMSUNG_MZ7L37T6HBLA-00A07_S6EPNN0W504441:
SG_IO: bad/missing sense data, sb[]: 70 00 0b 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ATA device, with non-removable media
Standards:
Likely used: 1
Configuration:
Logical max current
cylinders 0 0
heads 0 0
sectors/track 0 0
--
Logical/Physical Sector size: 512 bytes
device size with M = 10241024: 0 MBytes
device size with M = 1000
1000: 0 MBytes
cache/buffer size = unknown
Capabilities:
IORDY not likely
Cannot perform double-word IO
R/W multiple sector transfer: not supported
DMA: not supported
PIO: pio0

Once in awhile ill see this.

root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sdbu
Properties exchange failed

/dev/sdbu ATA SAMSUNG MZ7L37T6HBLA-00A07 JXTC304Q S6EPNN0W504441
TPer function (0x0001)
ACKNAK = N, ASYNC = N. BufferManagement = N, comIDManagement = N, Streaming = Y, SYNC = Y
Locking function (0x0002)
Locked = Y, LockingEnabled = Y, LockingSupported = Y, MBRDone = N, MBREnabled = N, MBRAbsent = N, MediaEncrypt = Y
Geometry function (0x0003)
Align = Y, Alignment Granularity = 8 (4096), Logical Block size = 512, Lowest Aligned LBA = 0
DataStore function (0x0202)
Max Tables = 9, Max Size Tables = 10485760, Table size alignment = 1
OPAL 2.0 function (0x0203)
Base comID = 0x1004, Initial PIN = 0x00, Reverted PIN = 0x00, comIDs = 1
Locking Admins = 4, Locking Users = 9, Range Crossing = N

So I know the drive i locked.

Most of the time I see this.

root@BunkSnapVaultProx:~# sedutil-cli --query /dev/sdbu
Invalid or unsupported disk /dev/sdbu

I have to be missing something major here.

Similar with this command as well.

root@BunkSnapVaultProx:# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu
Invalid or unsupported disk /dev/sdbu
root@BunkSnapVaultProx:
# sedutil-cli --setlockingrange 0 rw t MyPass /dev/sdbu
Invalid or unsupported disk /dev/sdbu
root@BunkSnapVaultProx:# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu
Invalid or unsupported disk /dev/sdbu
root@BunkSnapVaultProx:
# sedutil-cli --setlockingrange 0 rw MyPass /dev/sdbu
Properties exchange failed
unsigned int requested for token is unsupported

Testing with another drive, hit the same issue once the drive is locked. This time I manually locked the drive.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 ro PASSWORD /dev/sdcn
LockingRange0 set to RO

root@BunkSnapVaultProx:/dev/disk/by-id# mount /dev/sdcn /mnt
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C

This looks good, drive shouldn't mount when its RO.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sdcn
LockingRange0 set to RW

root@BunkSnapVaultProx:/dev/disk/by-id# mount /dev/sdcn /mnt
root@BunkSnapVaultProx:/dev/disk/by-id# df -h
Filesystem Size Used Avail Use% Mounted on
udev 189G 0 189G 0% /dev
tmpfs 38G 3.1M 38G 1% /run
/dev/mapper/pve-root 94G 26G 64G 29% /
tmpfs 189G 46M 189G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
efivarfs 192K 53K 135K 29% /sys/firmware/efi/efivars
/dev/sda2 511M 304K 511M 1% /boot/efi
/dev/fuse 128M 20K 128M 1% /etc/pve
tmpfs 38G 0 38G 0% /run/user/0
/dev/sdcn 7.0T 28K 6.6T 1% /mnt

Then I locked the drive and I am in the same position as the other drive.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --setlockingrange 0 lk PASSWORD /dev/sdcn
LockingRange0 set to LK

root@BunkSnapVaultProx:~# sedutil-cli --setlockingrange 0 rw PASSWORD /dev/sdcn
Properties exchange failed
unsigned int requested for token is unsupported

I am guessing my only option is the PSID?

I think this is the second or third issue I've seen here now where someone on Linux had issues with booting because of encrypted non-boot drives. You could read #449, I don't know if that will help you, sorry.

For the record, I think Windows might have problems with this too, I think the OSes just might not like seeing drives on boot they cannot access?

A PSID revert should at least bring the drive back (hopefully). You could also try revertnoerase (see /wiki/Command-Syntax) to just disable locking.

I think this is the second or third issue I've seen here now where someone on Linux had issues with booting because of encrypted non-boot drives. You could read #449, I don't know if that will help you, sorry.

A PSID revert should at least bring the drive back (hopefully). You could also try revertnoerase (see /wiki/Command-Syntax) to just disable locking.

Appreciate the input. I did try the revertNoErase. That works aok as long as I don't power down the drive or put it into a lk state.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --revertNoErase PASSWORD /dev/sdcn
Invalid or unsupported disk /dev/sdcn

These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.

I had these working pretty well with hdparm, im not sure what in the world im missing with sedutils.

These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.

There's also reverttper which uses your drive password I think (I've never tried it), it might have limitations compared to PSID revert, I don't know, or it works the same. But it might also not work if revertnoerase doesn't work?

Reading #449 again, maybe the solution for that (or a variation) could work here, too. basically having MBREnable on on boot (even though you don't need it because you're not booting from the drive) just so that the OS doesn't freak out about the drive because it can't access it. If that is the problem. shrug

These drives are remote, so I have to make a 40 minute trip to the data center to get the psid.

There's also reverttper which uses your drive password I think (I've never tried it), it might have limitations compared to PSID revert, I don't know, or it works the same. But it might also not work if revertnoerase doesn't work?

Reading #449 again, maybe the solution for that (or a variation) could work here, too. basically having MBREnable on on boot (even though you don't need it because you're not booting from the drive) just so that the OS doesn't freak out about the drive because it can't access it. If that is the problem. shrug

I gave that one a try with no luck as well.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --revertTPer PASSWORD /dev/sdbu
Invalid or unsupported disk /dev/sdbu

Its almost as if once the disk is locked, its bricked. Im not even sold the psid is going to help here.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --yesIreallywanttoERASEALLmydatausingthePSID TEST /dev/sdbu
Invalid or unsupported disk /dev/sdbu

Obviously the psid isn't test, but I would expect a different failure output, like the one below, on a disk that hasn't been locked yet.

root@BunkSnapVaultProx:/dev/disk/by-id# sedutil-cli --yesIreallywanttoERASEALLmydatausingthePSID TEST /dev/sdbd
method status code NOT_AUTHORIZED
Session start failed rc = 1
EndSession Failed

You might have to try taking the drive(s) out and doing this on another system. I hope they're not really bricked and it's just something going wrong with the OS? Do you have an external adapter that supports OPAL commands (sadly not all USB ones work, apparently those with a SATA controller do, and any Thunderbolt ones should work I think? I also have an NVME-to-USB one that works)? Or just a different computer to plug them into?

If they're actually bricked then this is of course a giant issue with sedutil (and sorry for your loss, oof). Wish we could put that into the README like really big on top or something but this repo is basically dead in terms of development and nobody else has control over it.

You might have to try taking the drive(s) out and doing this on another system. I hope they're not really bricked and it's just something going wrong with the OS? Do you have an external adapter that supports OPAL commands (sadly not all USB ones work, apparently those with a SATA controller do, and any Thunderbolt ones should work I think? I also have an NVME-to-USB one that works)? Or just a different computer to plug them into?

If they're actually bricked then this is of course a giant issue with sedutil (and sorry for your loss, oof). Wish we could put that into the README like really big on top or something but this repo is basically dead in terms of development and nobody else has control over it.

Is there a live CD/DVD OS I could use? I do have access to the ipmi port to mount remote iso's etc.

No idea, sorry.
Can't you just try a Linux OS ISO, all the distros can boot as a live OS, right? And I don't know if Windows has anything like that these days. (Would be interesting to see if things are different on Windows, imo...)
However, if the issue is connected to the hardware in the system then I would assume that a live OS wouldn't help. But if you can't take the drives out then I guess that's all you can try? :/

youk commented

Is there a live CD/DVD OS I could use? I do have access to the ipmi port to mount remote iso's etc.

Pretty much any Live Linux will do, once you set libata.allow_tpm=1 kernel parameter. You will need to copy sedutil-cli binary there, as it is absent from most repositories.

There is also Seagate/TCGstorageAPI which might provide more useful information (requires building from source though). The front part is in Python, so it is easy to write scripts for debugging.

FWIW I ran into the same exact issue on the samsung drives MZ7L37T6HBLA
As soon as the drives are locked, scsi and I/O errors show up in dmesg and the drives become unusable.

dmesg -w
mpt3sas_cm0: log_info(0x31110e03): originator(PL), code(0x11), sub_code(0x0e03)
 sd 4:0:1:0: Power-on or device reset occurred

Seems like udev gets into a loop with the kernel's SCSI POWER_ON_RESET_OCCURRED events. udev tries to access locked device, drive is locked and generates a POWER_ON_RESET_OCCURRED event, which causes udev to try to access device again.

Can see events via:

sudo udevadm monitor -p
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[79934.766131] change   /devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0
SUBSYSTEM=scsi
SDEV_UA=POWER_ON_RESET_OCCURRED
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SEQNUM=12593

and can see debug systemd-udevdmessages:
specifically Failed to run builtin 'blkid': Input/output error

$ sudo udevadm control --log-priority=debug

Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: /usr/lib/udev/rules.d/60-persistent-storage.rules:109 Failed to run builtin 'blkid': Input/output error                      
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: /usr/lib/udev/rules.d/60-persistent-storage.rules:119 LINK 'disk/by-id/wwn-0x5002538f02c309dc'                               
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Handling device node '/dev/sdb', devnum=b8:16                                                                                
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/block/8:16' to '../sdb'                                                              
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-SATA_SAMSUNG_MZ7L37T6_S6EPNA0TC03590'                      
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/scsi-SATA_SAMSUNG_MZ7L37T6_S6EPNA0TC03590' to '../../sdb'                 
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-path\x2fpci-0000:87:00.0-sas-exp0x500056b31054e8ff-phy1-lun-0'        
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-path/pci-0000:87:00.0-sas-exp0x500056b31054e8ff-phy1-lun-0' to '../../sdb'   
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-35002538f02c309dc'                                         
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/scsi-35002538f02c309dc' to '../../sdb'                                    
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fwwn-0x5002538f02c309dc'                                         
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Preserve already existing symlink '/dev/disk/by-id/wwn-0x5002538f02c309dc' to '../../sdb'                                    
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: sd-device: Created db file '/run/udev/data/b8:16' for '/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-
4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0/block/sdb'                                                                                                                  
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: Adding watch on '/dev/sdb'                                                                                                   
Mar 09 00:06:40 samsung systemd-udevd[17058]: sdb: sd-device: Created db file '/run/udev/data/b8:16' for '/devices/pci0000:80/0000:80:03.2/0000:87:00.0/host4/port-4:0/expander-
4:0/port-4:0:1/end_device-4:0:1/target4:0:1/4:0:1:0/block/sdb'     

I couldn't figure out a proper way to fix. Ideally we'd want udev to ignore locked devices somehow, or perhaps this is a drive firmware issue. As I have seen other devices still respond to blkid command even when locked. These devices seem to generate SCSI POWER_ON_RESET_OCCURRED for any command when locked.

The workaround is to stop udev and unlock device:

$ sudo systemctl stop systemd-udevd systemd-udevd-kernel.socket systemd-udevd-control.socket