radiorabe/rabe-zabbix

Faulty drive/degraded state doesn't trigger anything

Hyrla opened this issue · 4 comments

Hyrla commented

Hello,

I setup this wonderful plugin successfully. However, there is not alerts triggered even if I manually set a disk to faulty state.

My Zabbix Agent is used in passive mode so I just changed "Zabbix Agent (active)" to "Zabbix Agent" in discovery templates settings.

Here's the output of few debug commands :

su -c 'zabbix_agentd -t rabe.raid.md.raid-device.discovery' -s /bin/bash zabbix
rabe.raid.md.raid-device.discovery [t|{"data":[{"{#MD_RAID_RAID_DEV_NAME}":"md0"}]}]

mdadm -D /dev/md0

/dev/md0:
           Version : 1.2
     Creation Time : Fri Mar 27 19:42:13 2020
        Raid Level : raid5
        Array Size : 3906762752 (3725.78 GiB 4000.53 GB)
     Used Dev Size : 1953381376 (1862.89 GiB 2000.26 GB)
      Raid Devices : 3
     Total Devices : 3
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sun Mar 29 13:25:00 2020
             State : clean, degraded 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 1
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : unknown

              Name : scarif:0  (local to host scarif)
              UUID : a54c09d4:af8606cc:97bc4ab1:5d7c5f77
            Events : 23977

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       -       0        0        1      removed
       3       8       33        2      active sync   /dev/satac1

       1       8       17        -      faulty   /dev/satab1

My host's items discovered screenshot
https://i.ibb.co/FVcQpMM/Capture-d-cran-de-2020-03-29-17-04-49.png

My host's triggers for MD RAID active
https://i.ibb.co/g9qs4rj/Capture-d-cran-de-2020-03-29-17-08-21.png

Am I doing anything wrong ?

Thank you :)

Is the array showing up as degraded in /sys/block/md0/md/degraded?

Hyrla commented

@hairmare yes it is, this file exists and its content is "1"

@Hyrla sorry for the late reply, I overlooked this one.

A degraded drive should at least fire two triggers:

  • RAID array device MD {#MD_RAID_RAID_DEV_NAME} has {ITEM.VALUE1} degraded device(s) on {HOST.NAME}
  • RAID component device MD {#MD_RAID_RAID_DEV_NAME} RD {#MD_RAID_COMPONENT_DEV_NAME} is in {ITEM.VALUE1} state on {HOST.NAME}

In your case above:

  • RAID array device MD md0 has 1 degraded device(s) on {HOST.NAME}
  • RAID component device MD md0 RD satab1 is in faulty state on {HOST.NAME}

Cloud you please post the relevant latest values of the affected host?