NagiosEnterprises/ncpa

non-ASCII character in service name is perhaps tripping up the api/services requests

Opened this issue · 6 comments

NCPA 3.1.1 on Windows Server 2019

  • Using the web GUI and looking at api/services results in three-dots and no service content loading
  • In Nagios, the following status is returned for all service running checks for this server: "UNKNOWN: An error occurred connecting to API. (HTTP error: '500 INTERNAL SERVER ERROR')"
  • the offending service seems to have a non-ASCII character in both the Service Name and the Display Name
  • logs from the ncpa_listener log file are below ("example.com" replaces the real server name)

image

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "flask\app.py", line 1473, in wsgi_app
File "flask\app.py", line 882, in full_dispatch_request
File "flask\app.py", line 880, in full_dispatch_request
File "flask\app.py", line 865, in dispatch_request
File "listener\server.py", line 317, in token_auth_decoration
File "listener\server.py", line 1507, in api
File "listener\services.py", line 354, in run_check
File "listener\services.py", line 18, in wrapper
File "listener\services.py", line 112, in get_services_via_psutil
File "psutil_pswindows.py", line 628, in status
File "psutil_pswindows.py", line 558, in _query_status
File "contextlib.py", line 158, in exit
File "psutil_pswindows.py", line 583, in _wrap_exceptions
psutil.NoSuchProcess: service 'SNAP Live²' does not exist (name='SNAP Live²')
2024-10-30 15:26:53,115 listener INFO before_request() - request.url: https://snaps:5693/api/disk/logical/C:|/used_percent/?token=********&warning=80&critical=90&check=1
2024-10-30 15:27:11,553 listener INFO before_request() - request.url: https://snaps:5693/api/cpu/count/?token=********&check=1
2024-10-30 15:27:32,803 listener INFO before_request() - request.url: https://snaps:5693/api/system/version/?token=********&check=1
2024-10-30 15:27:33,897 listener INFO Did not receive normal values. Unable to find meaningful check.
2024-10-30 15:27:49,834 listener INFO before_request() - request.url: https://example.com:5693/gui/
2024-10-30 15:27:54,084 listener INFO before_request() - request.url: https://example.com:5693/gui/checks
2024-10-30 15:27:58,194 listener INFO before_request() - request.url: https://example.com:5693/gui/api
2024-10-30 15:27:58,412 listener INFO before_request() - request.url: https://example.com:5693/api
2024-10-30 15:28:01,397 listener INFO before_request() - request.url: https://example.com:5693/api/services
2024-10-30 15:28:01,491 listener.server ERROR Exception on /api/services [GET]
Traceback (most recent call last):
File "psutil_pswindows.py", line 570, in _wrap_exceptions
File "psutil_pswindows.py", line 559, in _query_status
OSError: [WinError 1060] The specified service does not exist as an installed service: '(originated from OpenService)'

NCPA was built to support any UTF-8 characters (a superset of ASCII), so NCPA will likely break if non-UTF-8 characters are used for checks. That said, the ² character is within UTF-8, so it shouldn't be breaking things unless there is an encoding issue. I will investigate when I get the opportunity.

I have a similar problem with a windows server running some stupid software with umlauts in its servicename.
The result of check_ncpa.py requesting the "services" module is:
UNKNOWN: An error occurred connecting to API. (HTTP error: '500 INTERNAL SERVER ERROR')

NCPA-Version 3.1.0

Out of curiosity, what output do you get for following in PowerShell for an affected system?

$OutputEncoding = [System.Text.Encoding]::UTF8
get-service "service name for the given service"

I'm thinking this is likely a problem with 'psutil' module.

The code path is straightfoward in NCPA with no UTF string translations occurring.

for service in psutil.win_service_iter(): name = service.name() if service.status() == 'running': services[name] = 'running' else: services[name] = 'stopped'

We essentially get a PSUTIL winservice class, and the error is getting the service status from that object without any variables being changed within NCPA, itself.

if service.status() == 'running':

I checked history of psutil, and don't see any known issues on versions of psutil since 3.1.1 deployment of NCPA.

Thanks ... so how do we move this forwards?
Current situation is that all service checks for an affected server fail (non-service checks via NCPA do work OK).

image