akpw/mktxp

wrong psu1_state,psu2_state

Closed this issue · 10 comments

geting 0 for this stats but in code i sea if ok then it must be 1

akpw commented

@smitas3400 can you pls provide terminal listing for this one

in promethus i get mktxp_system_psu1_state{instance="10.10.100.101:49090", job="mktxp", routerboard_address="ipadresa", routerboard_name="giraite_pusyno_spinta"} | 0
but if i understand correctly it must be 1 if psu state = "ok" and 0 if "fail"?

akpw commented

and what do you get in your router terminal, can you share the result of system/health/print detail?
Also, use wget or a browser to access mktxp_ip_address:49090 and then check for / share the psu1_state metrics from there?

mikrotik terminal
0 name="sfp-temperature" value=49 type=C

1 name="switch-temperature" value=48 type=C

2 name="fan-state" value=ok type=""

3 name="fan1-speed" value=4080 type=RPM

4 name="fan2-speed" value=4125 type=RPM

5 name="fan3-speed" value=4125 type=RPM

6 name="psu1-state" value=ok type=""

7 name="psu2-state" value=ok type=""

in mktxp

HELP mktxp_system_psu1_state System PSU1 state

TYPE mktxp_system_psu1_state gauge

mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu2_state System PSU2 state

TYPE mktxp_system_psu2_state gauge

mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_switch_temperature Current switch temperature

TYPE mktxp_system_switch_temperature gauge

mktxp_system_switch_temperature{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 48.0

HELP mktxp_system_psu1_state System PSU1 state

TYPE mktxp_system_psu1_state gauge

mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu2_state System PSU2 state

TYPE mktxp_system_psu2_state gauge

mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu1_state System PSU1 state

TYPE mktxp_system_psu1_state gauge

mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu2_state System PSU2 state

TYPE mktxp_system_psu2_state gauge

mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_fan_one_speed System fan 1 current speed

TYPE mktxp_system_fan_one_speed gauge

mktxp_system_fan_one_speed{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 4050.0

HELP mktxp_system_psu1_state System PSU1 state

TYPE mktxp_system_psu1_state gauge

mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu2_state System PSU2 state

TYPE mktxp_system_psu2_state gauge

mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_fan_two_speed System fan 2 current speed

TYPE mktxp_system_fan_two_speed gauge

mktxp_system_fan_two_speed{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 4125.0

HELP mktxp_system_psu1_state System PSU1 state

TYPE mktxp_system_psu1_state gauge

mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu2_state System PSU2 state

TYPE mktxp_system_psu2_state gauge

mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_fan_three_speed System fan 3 current speed

TYPE mktxp_system_fan_three_speed gauge

mktxp_system_fan_three_speed{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 4125.0

HELP mktxp_system_psu1_state System PSU1 state

TYPE mktxp_system_psu1_state gauge

mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu2_state System PSU2 state

TYPE mktxp_system_psu2_state gauge

mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu1_state System PSU1 state

TYPE mktxp_system_psu1_state gauge

mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 1.0

HELP mktxp_system_psu2_state System PSU2 state

TYPE mktxp_system_psu2_state gauge

mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu1_state System PSU1 state

TYPE mktxp_system_psu1_state gauge

mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0

HELP mktxp_system_psu2_state System PSU2 state

TYPE mktxp_system_psu2_state gauge

mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 1.0

just dont understant why ther is so much enterys for psu staste i some shows normal status others only 0
and now in promethus i started geting errore it startet then i lounched mktxp in my monitoring stack

name: PrometheusTargetScrapeDuplicate
expr: increase(prometheus_target_scrapes_sample_duplicate_timestamp_total[5m]) > 0
labels:
severity: warning
source: prometheus
annotations:
description: Prometheus has many samples rejected due to duplicate timestamps but different values
VALUE = {{ $value }}
LABELS = {{ $labels }}
summary: Prometheus target scrape duplicate (instance {{ $labels.instance }})

and ther is for all mikoriks multiple enterys for psu_state

akpw commented

can you try out now with the latest?

still same

# HELP mktxp_system_psu1_state System PSU1 state
# TYPE mktxp_system_psu1_state gauge
mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu2_state System PSU2 state
# TYPE mktxp_system_psu2_state gauge
mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_switch_temperature Current switch temperature
# TYPE mktxp_system_switch_temperature gauge
mktxp_system_switch_temperature{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 47.0
# HELP mktxp_system_psu1_state System PSU1 state
# TYPE mktxp_system_psu1_state gauge
mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu2_state System PSU2 state
# TYPE mktxp_system_psu2_state gauge
mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu1_state System PSU1 state
# TYPE mktxp_system_psu1_state gauge
mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu2_state System PSU2 state
# TYPE mktxp_system_psu2_state gauge
mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_fan_one_speed System fan 1 current speed
# TYPE mktxp_system_fan_one_speed gauge
mktxp_system_fan_one_speed{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 4065.0
# HELP mktxp_system_psu1_state System PSU1 state
# TYPE mktxp_system_psu1_state gauge
mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu2_state System PSU2 state
# TYPE mktxp_system_psu2_state gauge
mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_fan_two_speed System fan 2 current speed
# TYPE mktxp_system_fan_two_speed gauge
mktxp_system_fan_two_speed{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 4125.0
# HELP mktxp_system_psu1_state System PSU1 state
# TYPE mktxp_system_psu1_state gauge
mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu2_state System PSU2 state
# TYPE mktxp_system_psu2_state gauge
mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_fan_three_speed System fan 3 current speed
# TYPE mktxp_system_fan_three_speed gauge
mktxp_system_fan_three_speed{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 4110.0
# HELP mktxp_system_psu1_state System PSU1 state
# TYPE mktxp_system_psu1_state gauge
mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu2_state System PSU2 state
# TYPE mktxp_system_psu2_state gauge
mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu1_state System PSU1 state
# TYPE mktxp_system_psu1_state gauge
mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 1.0
# HELP mktxp_system_psu2_state System PSU2 state
# TYPE mktxp_system_psu2_state gauge
mktxp_system_psu2_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu1_state System PSU1 state
# TYPE mktxp_system_psu1_state gauge
mktxp_system_psu1_state{routerboard_address="10.0.226.7",routerboard_name="giraite_pusyno_spinta"} 0.0
# HELP mktxp_system_psu2_state System PSU2 state
# TYPE mktxp_system_psu2_state gauge
akpw commented

not much difference in the code regarding how various health metrics are retrieved, so just wondering -- is it happening on multiple devices with psu1-state available, or specifically on a given device / environment / configuration?

is happening on all devices and with devices which don't have psu_state
my config
[MKTXP]
listen = '0.0.0.0:49090' # Space separated list of socket addresses to listen to, both IPV4 and IPV6
socket_timeout = 5

initial_delay_on_failure = 120
max_delay_on_failure = 900
delay_inc_div = 5

bandwidth = False                # Turns metrics bandwidth metrics collection on / off
bandwidth_test_interval = 600    # Interval for collecting bandwidth metrics
minimal_collect_interval = 5     # Minimal metric collection interval

verbose_mode = True             # Set it on for troubleshooting

fetch_routers_in_parallel = True   # Fetch metrics from multiple routers in parallel / sequentially
max_worker_threads = 5              # Max number of worker threads that can fetch routers (parallel fetch only)
max_scrape_duration = 30            # Max duration of individual routers' metrics collection (parallel fetch only)
total_max_scrape_duration = 90      # Max overall duration of all metrics collection (parallel fetch only)

compact_default_conf_values = False  # Compact mktxp.conf, so only specific values are kept on the individual routers' level

[akademija_gw]
hostname = 213.226.176.218

[domeikava_gw]
hostname = 213.226.176.222

[ezerelis_gw]
hostname = xxx.xxx.xxx.xxx

[linksmakalnis_gw]
hostname = xxx.xxx.xxx.xxx

[raudonvaris_gw]
hostname = xxx.xxx.xxx.xxx

[ziezmariai_hq]
hostname = xxx.xxx.xxx.xxx

[giraite_pusyno_spinta]
hostname = xxx.xxx.xxx.xxx

[kaunas_lubinu_spinta]
hostname = xxx.xxx.xxx.xxx

[uzliedziai_pieniu_1_spinta]
hostname = xxx.xxx.xxx.xxx

[uzliedziai_pieniu_36_spinta]
hostname = xxx.xxx.xxx.xxx

[default]
ipsec = False
wireless_clients = False
monitor = True
use_comments_over_names = True
connection_stats = False
check_for_updates = False
wireless = False
capsman = False
ssl_certificate_verify = False
neighbor = False
user = True
plaintext_login = True
dhcp = False
interface = True
kid_control_dynamic = False
capsman_clients = False
netwatch = False
poe = False
use_ssl = False
ipv6_neighbor = False
connections = False
no_ssl_certificate = False
pool = False
ipv6_firewall = False
enabled = True
public_ip = False
kid_control_assigned = False
lte = False
firewall = False
ipv6_pool = False
installed_packages = True
dhcp_lease = False
queue = False
route = False
bgp = False
ipv6_route = False
switch_port = False
hostname = localhost
username = xxx.xxx.xxx.xxx
password = xxx.xxx.xxx.xxx
remote_dhcp_entry = None
remote_capsman_entry = None
port = 8728

akpw commented

yes I did another change that should help, can you try out with the latest now?

ok now it fixed :) devices only with psu_state show stats, and no more duplicates :)