paregupt/ucs_traffic_monitor

New 4.2(2c) Domain Added to Config, but not loading in UCS Monitor

Opened this issue · 4 comments

We've added a newly built, 4.2(2c) UCS Domain and are unable to get it to load properly in UCS Monitor. When looking at the logs, we are seeing the following errors at boot. It appears that Python is not able to launch the primary python script correct. Our other 6 UCS Domains are all loading properly, we've got 1 stale entry that was entered in the past but not purged from the system yet, and this new domain will not load at all.

I've went through the Telegraf.conf file, all is set there. The path to the file, file name are correct. The URL, username and password are correct on the ucs_domains_group_1.txt file

All are running locally on UCS Manager with a UCS Central. No IMM here.

4 domains working, running 4.1(3h)
1 domain working, that was in the UCS Mon previously, then firmware upgraded from 4.1(3h) to 4.2(2c)
1 domain not working, that is newly built as 4.2(2c)

Any suggestions?

Log file

2023-04-11T00:20:09Z E! [inputs.exec] Error in plugin: exec: signal: terminated for command 'python3 /usr/local/telegraf/ucs_traffic_monitor.py /usr/local/telegraf/ucs_domains_group_1.txt influxdb-lp -vv':
2023-04-11T00:20:09Z I! [agent] Hang on, flushing any cached metrics before shutdown
2023-04-11T00:20:09Z I! [agent] Stopping running outputs
2023-04-11T00:20:45Z I! Loaded inputs: cpu disk diskio exec (2x) kernel mem net processes swap system
2023-04-11T00:20:45Z I! Loaded aggregators:
2023-04-11T00:20:45Z I! Loaded processors:
2023-04-11T00:20:45Z I! Loaded outputs: influxdb
2023-04-11T00:20:45Z I! Tags enabled: host=ucsmon01
2023-04-11T00:20:45Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ucsmon01", Flush Interval:10s
2023-04-11T00:20:45Z W! [outputs.influxdb] When writing to [http://localhost:8086]: database "telegraf" creation failed: Post "http://localhost:8086/query": dial tcp [::1]:8086: connect: connection refused
2023-04-11T00:23:50Z E! [inputs.exec] Error in plugin: exec: command timed out for command 'python3 /usr/local/telegraf/ucs_traffic_monitor.py /usr/local/telegraf/ucs_domains_group_1.txt influxdb-lp -vv':

Any help is greatly appreciated.

I am able to successfully SSH from the UTM host to the UCS Manager in question. Not sure if I'm running into the same issue as #94 or not. This UCS Monitor has been running for a couple of years.

Since at least one domain is working with 4.2(2c), for now let's assume that firmware is compatible with UTM.

Are all these domain in the same input file or separate file?

I need to look at the complete logs for further analysis. Feel free to email your log file to my cisco email, which is same as my GitHub id.

Email sent. Your help is greatly appreciated.

@paregupt

Upgrading Python to 3.7, installing UCSMSDK, installing NETMIKO, and modifying the config file to use Python37 resolved our issues. We have both 4.1(3h) and 4.2(2d) running with this new Python 3.7 install version with no issues.

Our hiccup during the install/upgrade was that we were not calling out pip3.7 specifically when installing UCSMSDK and NETMIKO.

7 Domains (4x at 4.1(3h) and 3x at 4.2(2d))
49 Chassis
307 Servers
2 Locations