martijnvanbrummelen/nwipe

No Serial Number for SAS drive

Closed this issue · 23 comments

Doesn't show SN. Attached are nwipe -v, smartctl -i and smartctl -x logs

nwipe-shows no serial.log

nwipe-smartctl shows serial.log

Oddly enough nwipe and smartctl -i show no temperature, but smartctl -x does show:
nwipe-smartctl-temp.log

Is it possible it's a faulty drive with an intermittent problem?

=== START OF READ SMART DATA SECTION ===
SMART Health Status: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]

I'm not sure how you would prove it other than give it a helping hand to its demise by doing a DOD 7 pass wipe with verification on every wipe and seeing if that killed it, or at least how many errors occured. Although maybe that error above relates only to the smart data functionality.

One trick I sometimes use to recover data from a failing drive is seal it in a plastic bag and put it in the freezer for 30 mins to an hour. Pull it out and connect it real quick to your system and run all those commands again and see if they now work, they might do, or at least until the drive reaches a higher temperature again.

However, I'll take another look at those logs again just in case I'm missing something.

Yes this is a bad drive that I removed from production. It's still reacting so it needs to be wiped before discard. It's the first time I'm not seeing a serial number on a drive where it should normally show.

@Firminator Did we come to the conclusion that the data was intermittently available because of a fault on the drive? I'm just wondering whether to leave this issue open or close it?

Can you point me to the code or commit that adds Serial # detection? I might just want to look over it. It's odd that it's not detected. The drive has a bunch of bad sectors, but it perfectly fine identified to smartmontools and it also wiped without a hitch.

/src/temperature.c contains the key temperature detection functions.

nwipe_init_temperature( nwipe_context_t* c )

Each drive context ( a structure containing all information about a selected drive ) has to be matched against an entry in the hwmon directories. We already know that depending upon the type of drive the drive name will appear in different places in the directory structure. We need to match a drive name in the context with the matching drive name in hwmon, once we do we can then match the correct temperature with the correct drive. This function determines a path to the temperatures for a specific drive and writes it to the drive context for use by the function nwipe_update_temperature(). For every drive this path will be different. This is done once only at nwipe startup.

nwipe_update_temperature( nwipe_context_t* c )

This uses the path previously determined by the function nwipe_init_temperature(), it's called from the GUI code once every 60 seconds and reads the temperatures for each drive from the hwmon structure, it places those temperatures in the appropriate drive context. The GUI code reads the drive contexts every 250ms (if I remember correctly) and updates the on screen temperature with temperature data found in the drive context.

So when diagnosing this problem I would start with the following:

  • For this particular drive, can you consistently find the temperature anywhere under /sys/class/hwmon/hwmonX ?. If you can't nwipe will never be able to determine the temperature. We already look in /sys/class/hwmon/hwmonX/device/block, /sys/class/hwmon/hwmonX/device/nvme/nvme0 and /sys/class/hwmon/hwmonX/device/. It's possible a fourth location exists that we are not looking in. With the drive connected you would need to look under every subdirectory in /sys/class/hwmon/hwmonX to find /dev/sda (assuming that's what the device is called). If the device exists then we look for the temperatures within that directory structure. If those temperatures exist then we have something to work with.

So the very first thing to do before looking at the code is to determine whether the temperature for /dev/sda are consistently appearing somewhere under /sys/class/hwmon. Without that information beng present under /sys/class/hwmon you are wasting your time looking at the code. However if that information is present then we are not looking in the correct place and a fourth path will need to be determined.

So the question is, "Are the temperatures for /dev/sda appearing under /sys/class/hwmon when you have that specific drive plugged in?" Depending upon the answer will determine what step is taken next.

To simplify that search when you are manually looking through /sys/class/hwmon it will be much easier to only have that specific Seagate drive attached as the more drives attached results in /sys/class/hwmon/hwmonX, becoming /sys/class/hwmon/hwmon1, /sys/class/hwmon/hwmon2, /sys/class/hwmon/hwmon3, upto the number of the drives. The temperature initialisation code will search any hwmonX directory, it doesn't matter whether it's hwmon0 or hwmonXYZ, they will all get searched, so the code isn't restricting itself to hwmon1-x.

Sorry, I just realised you are after serial number detection not temperature detection. ! Serial number detection has a number of methods it uses to try and obtain that.

serial number determination

While this function is where we have failed to get serial number by other methods and resort to smartctl -i nwipe_get_device_bus_type_and_serialno

This function uses smartctl -i so I think you mentioned that smartctl -i, for this particular drive does not return a serial number but smartctl -x does?

If that's correct then I can modify the code to also check using smartctl -x as well as smartctl -i or maybe instead of which would then give us the serial number. I'm not sure why smartctl -i doesn't show serial number for this particular drive when other drives are ok but just another quirk I guess.

I'll make some changes if you could test it for me.

Also, looking at the smartctl output you provided. Do I also need to add some additional items to be anonymised:

Logical Unit id:
Serial number:
[2021/11/26 12:43:33]   debug: smartctl: smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.19] (local build) 
[2021/11/26 12:43:33]   debug: smartctl: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org 
[2021/11/26 12:43:33]   debug: smartctl: === START OF INFORMATION SECTION === 
[2021/11/26 12:43:33]   debug: smartctl: Vendor:               SEAGATE 
[2021/11/26 12:43:33]   debug: smartctl: Product:              ST9300653SS 
[2021/11/26 12:43:33]   debug: smartctl: Revision:             YS0D 
[2021/11/26 12:43:33]   debug: smartctl: Compliance:           SPC-4 
[2021/11/26 12:43:33]   debug: smartctl: User Capacity:        300,000,000,000 bytes [300 GB] 
[2021/11/26 12:43:33]   debug: smartctl: Logical block size:   512 bytes 
[2021/11/26 12:43:33]   debug: smartctl: Rotation Rate:        15000 rpm 
[2021/11/26 12:43:33]   debug: smartctl: Form Factor:          2.5 inches 
[2021/11/26 12:43:33]   debug: smartctl: Logical Unit id:      [removed] 
[2021/11/26 12:43:33]   debug: smartctl: Serial number:        [shows Serial Number here, but I removed it] 
[2021/11/26 12:43:33]   debug: smartctl: Device type:          disk 
[2021/11/26 12:43:33]   debug: smartctl: Transport protocol:   SAS (SPL-3) 
[2021/11/26 12:43:33]   debug: smartctl: Local Time is:        Fri Nov 26 12:43:33 2021 UTC 
[2021/11/26 12:43:33]   debug: smartctl: SMART support is:     Available - device has SMART capability. 
[2021/11/26 12:43:33]   debug: smartctl: SMART support is:     Enabled 
[2021/11/26 12:43:33]   debug: smartctl: Temperature Warning:  Disabled or Not Supported [although smartctl -x shows the temperature?!?!]

I notice that the serial number label is slightly different between bothe /dev/sda and /dev/sdb, it looks like it might be as simple as ignoring case as /dev/sda has the label "Serial number" while /dev/sdb has the label "Serial Number". That 'n' may be why the serial number is missing.

I'll check and patch that first. Are you still able to compile nwipe on your system with that particular drive or am I going to need to build a new shredos?

Here's the problem. strstr should() be strcasestr() in the following line string comparison snippet Line 707 of device.c

Note ! If you are using chrome click right and open in new tab rather than click left on the link. By clcking right it opens at the specific highlighted line 707. Clicking left just open the file at the beginning. At least that's how it works on my version of chrome (Version 96.0.4664.93 (Official Build) (64-bit) Linux)

In fact all three strstr() functions in that section of code should be replaced with strcasestr(). Once I've written the code to extract smartctl data using json this section will need to be rewritten.

Just to confirm, was this drive /dev/sda Seagate ST9300653SS missing both serial number and temperature information?

This function uses smartctl -i so I think you mentioned that smartctl -i, for this particular drive does not return a serial number but smartctl -x does?

smartctl -i shows Serial Number. And smartctl -x as well. But nwipe doesn't.

Just to confirm, was this drive /dev/sda Seagate ST9300653SS missing both serial number and temperature information?

Yes the drive was the ST9300653SS.
Yes Serial Number and Temperature was missing in nwipe.

Are you still able to compile nwipe on your system with that particular drive or am I going to need to build a new shredos?

Yes I can still compile nwipe, but I was never successful compiling ShredOS.

Let's forget about the temp. problem for now as I also was misreading both smartcl -i and - x logs. It says Temperature Warning is disabled, not Temperature:

Temperature Warning: Disabled or Not Supported

Should probably open a new Issue for the temperature because smartctl -x shows temp ( see https://github.com/martijnvanbrummelen/nwipe/files/7610701/nwipe-smartctl-temp.log ), so there must be a way to extract that with hwtemp.

@Firminator If serial number issue is not fixed by pull request #394 please reopen this issue.

@Firminator If you could obtain the full smartctl -x output, that contains the temperature information for that SAS drive. I'll need that info to be able to pull in the temperature data from smartctl as a fall back when hwmon doesn't provide it. Thanks.

@Firminator I did a v0.32.019 release with the serial number fix in my fork if that helps nwipe v0.32.019

Just for reference, in one of the earlier comments I mentioned using strcasestr as a non case sensitive replacement for strstr, however strcasestr is a non standard extension that isn't available. So I used a while loop & if statement instead. It's more lines of code however faster in terms of execution speed than creating duplicate strings and using tolower etc.

Thanks for that link https://github.com/martijnvanbrummelen/nwipe/files/7610701/nwipe-smartctl-temp.log

Yes that will be very useful for extracting the current temperature. Interesting they use the term trip temperature. I wonder if that's the same as critical temperature? It's certainly the same sort of value. Using the term 'trip' is a bit odd. It almost sounds like the drive shuts down when it hits that temperature.

@Firminator I think I can kill two birds with one stone. Not only get this SAS drives temperature fixed but also enable USB device temperatures to be displayed if a supported USB/SATA chipset is used in the adapter. I'm a bit busy tomorrow but I'll have a patch for the temperatures by Tuesday.

Awesome. Thanks for the patch. I'll will not be able to test though since I need ShredOS for the SAS drive (since I can't attach SAS drives natively to my laptop). Else I would need to go through the whole process of installing an OS in order to compile nwipe. This device doesn't have an OS. I always only boot ShredOS on it.

So the very first thing to do before looking at the code is to determine whether the temperature for /dev/sda are consistently appearing somewhere under /sys/class/hwmon. Without that information beng present under /sys/class/hwmon you are wasting your time looking at the code. However if that information is present then we are not looking in the correct place and a fourth path will need to be determined.

So the question is, "Are the temperatures for /dev/sda appearing under /sys/class/hwmon when you have that specific drive plugged in?" Depending upon the answer will determine what step is taken next.

Do you still need me to check /sys/class/hwmon out for this drive?

Awesome. Thanks for the patch. I'll will not be able to test though since I need ShredOS for the SAS drive (since I can't attach SAS drives natively to my laptop). Else I would need to go through the whole process of installing an OS in order to compile nwipe. This device doesn't have an OS. I always only boot ShredOS on it.

No problem, re ShredOS, if I do the temperature patch first then roll both into a new version of ShredOS towards the end of this week.

Do you still need me to check /sys/class/hwmon out for this drive?

That might be useful, yes, if there's no temperature data for that SAS drive in there, then you could open an issue with the maintainer of the drivetemp module drivetemp

@Firminator I have an old Maxtor 80GB 6Y080P0 (IDE interface on a old Pentium4) that also doesn't show the temperature, investigating I found that drivetemp doesn't recognise it and therefore doesn't populate the hwmon directory but smartctl -x does retrieve the temperature. I think I'll add this as a new issue in nwipe but also raise it as an issue in drivetemp which is the root cause of the problem. The quick fix is for me to retrieve the smartctl -x temperature data, the same patch will fix your SAS drive temperature problem.

Hm, did you see the open issue over there @ https://github.com/groeck/drivetemp/issues/1
Could be related.
If drivetemp fails to read temperature directly from firmware using SCT commands then it is supposed to fallback to get the temperature from SMART (value 194), but the maintainer mentions that this is buggy or doesn't work. I don't ahve a full understadning of the issue.

Also note there is a comment in https://github.com/groeck/drivetemp/blob/master/drivetemp.c that some Maxtor drives (which?) don't report temperature correctly:

 Known exceptions (from libatasmart):
 * - SAMSUNG SV0412H and SAMSUNG SV1204H) report the temperature in 10th
 *   degrees C in the first two raw bytes.
 * - A few Maxtor drives report an unknown or bad value in attribute 194.

As mentioned in the drivetemp issue, I don't have a drive which does only supports SMART, so I can't test what may be wrong with the code. Patches would be appreciated. The comments regarding the "few Maxtor drives" are, as mentioned, from libatasmart. I don't have any additional information.