oetiker/SmokePing

Slave graphs all Targets..except one?

Closed this issue · 1 comments

Hey guys, I have a master smokeping server and 2 slave servers (slave1, slave2).

Slave1 graphs successfully to all our targets.
Slave2 graphs successfully to all targets - EXCEPT one target.

I'm scratching my head on this one. Here are the things that I've confirmed:

In the master config file - both slaves are set for all targets. Here is what the problem target looks like (which is the same as all others)

----------------------------------
+ mysite1
menu = Management VLAN
title = Core switch in each site
alerts = someloss

slaves = slave1 slave2
-------------------------------------------

I've run a permission check on the appropriate folders:

# file: var/www/html/smokeping/cache
# owner: apache
# group: apache
user::rwx
group::r-x
other::r-x

getfacl: Removing leading '/' from absolute path names
# file: var/www/html/smokeping/data
# owner: apache
# group: apache
user::rwx
group::r-x
other::r-x

getfacl: Removing leading '/' from absolute path names
# file: opt/smokeping/var
# owner: root
# group: root
user::rwx
group::r-x
other::r-x

Besides - if there was an issue with permissions, wouldn't ALL graphs have problems?

I can see the RRD files being created in the master server from both slaves. Here is a snippet from running ls -la on the problem target's data folder (in var/www/html/smokeping/data):

-rw-r--r--  1 root   root   2986808 May 28 11:24 stlouis~slave1.rrd
-rw-r--r--  1 apache apache     417 May 28 11:26 stlouis.slave1.slave_cache
-rw-r--r--  1 root   root   2986808 May 28 11:24 stlouis~slave2.rrd
-rw-r--r--  1 apache apache     100 May 28 11:26 stlouis.slave2.slave_cache
-rw-rw-rw-  1 apache apache 2986808 May 28 11:24 stlouis.rrd
-rw-r--r--  1 root   root   2986808 May 28 11:24 tampa~slave1.rrd
-rw-r--r--  1 apache apache     417 May 28 11:26 tampa.slave1-1.slave_cache
-rw-r--r--  1 root   root   2986808 May 28 11:24 tampa~slave2.rrd
-rw-r--r--  1 apache apache     100 May 28 11:26 tampa.slave2.slave_cache
-rw-rw-rw-  1 apache apache 2986808 May 28 11:24 tampa.rrd

Is there anything I'm missing here guys?

I discovered the issue after comparing the debug logs of both my slaves. It turns out that my DNS settings were improper on slave2 as FPing could not hit any of our internal management IP's - but it had no issues with any outside IP's. I was bamboozled by seeing the RRD's populating in real time, but I guess it just sent it blank information. Once I fixed the DNS settings, the RRD's received real information and begang graphing.