shinken-solutions/shinken

poller problem - error worker.py action.py - privilege and Operation not permitted

ppj-0 opened this issue · 0 comments

ppj-0 commented

Hello,

I discovered a problem with a privilege issue when a worker / action starts a process (shinken user) started by sudo (root). A crash is reported in the log of the poller, and the number of workers increases. After a while, after more than 8,000 workers, the poller is always seen up but no longer checks. The livestatus does not show any change of state until the relaunch of the poller service. The incidents are no longer visible, only the date of the last check indicates the last passage for several days.

Command : sudo /var/lib/shinken/libexec/check_ro_filesystem_by_ssh.py -H xxxxx -u root

Poller Debug :

[1549457111] DEBUG: [Shinken] ========================
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:273 (Queued:11694 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:659 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:694 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:708 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:721 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:725 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:729 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:730 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:733 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:736 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:737 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:742 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:743 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:744 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:745 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:747 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:748 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:749 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:750 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:751 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:752 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:753 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:754 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] [0][scheduler-master][fork] Stats: Workers:755 (Queued:0 TotalReturnWait:48)
[1549457111] DEBUG: [Shinken] Wait ratio: 1.967326
[1549457111] DEBUG: [Shinken] [poller-0952] Trying to adjust worker number. Actual number : 24, min per module : 24, max per module : 24
[1549457112] DEBUG: [Shinken] Debug perf: ping [args:1.09672546387e-05] [aqu_lock:2.14576721191e-06][calling:4.05311584473e-06] [json:1.78813934326e-05] [global:3.50475311279e-05]
[1549457112] DEBUG: [Shinken] Debug perf: get_external_commands [args:7.15255737305e-06] [aqu_lock:9.53674316406e-07][calling:3.50475311279e-05] [json:1.09672546387e-05] [global:5.41210174561e-05]
[1549457112] DEBUG: [Shinken] Ask actions to 0, got 49
[1549457112] DEBUG: [Shinken] Debug perf: ping [args:1.4066696167e-05] [aqu_lock:9.53674316406e-07][calling:3.09944152832e-06] [json:1.50203704834e-05] [global:3.31401824951e-05]
[1549457112] DEBUG: [Shinken] HTTP: calling lock for get_broks
[1549457112] DEBUG: [Shinken] Posting to http://10.30.250.118:7768/put_results: 35248B
[1549457112] DEBUG: [Shinken] Loop turn
[1549457112] DEBUG: [Shinken] Debug perf: get_broks [args:7.82012939453e-05] [aqu_lock:0.0503358840942][calling:0.000340938568115] [json:4.6968460083e-05] [global:0.0508019924164]
[1549457113] DEBUG: [Shinken] Debug perf: ping [args:1.19209289551e-05] [aqu_lock:2.14576721191e-06][calling:4.76837158203e-06] [json:1.90734863281e-05] [global:3.79085540771e-05]
[1549457113] DEBUG: [Shinken] Debug perf: get_external_commands [args:8.10623168945e-06] [aqu_lock:9.53674316406e-07][calling:4.00543212891e-05] [json:1.38282775879e-05] [global:6.29425048828e-05]
[1549457113] DEBUG: [Shinken] Debug perf: ping [args:1.09672546387e-05] [aqu_lock:1.90734863281e-06][calling:4.05311584473e-06] [json:2.09808349609e-05] [global:3.79085540771e-05]
[1549457113] DEBUG: [Shinken] HTTP: calling lock for get_broks
[1549457113] DEBUG: [Shinken] Debug perf: get_broks [args:8.10623168945e-05] [aqu_lock:0.000241041183472][calling:0.000185966491699] [json:1.69277191162e-05] [global:0.000524997711182]
[1549457113] ERROR: [Shinken] Worker '742' exit with an unmanaged exception : Traceback (most recent call last):
File "/products/python/python2.7.6/lib/python2.7/site-packages/Shinken-2.4-py2.7.egg/shinken/worker.py", line 227, in work
self.do_work(s, returns_queue, c)
File "/products/python/python2.7.6/lib/python2.7/site-packages/Shinken-2.4-py2.7.egg/shinken/worker.py", line 274, in do_work
self.manage_finished_checks()
File "/products/python/python2.7.6/lib/python2.7/site-packages/Shinken-2.4-py2.7.egg/shinken/worker.py", line 186, in manage_finished_checks
action.check_finished(self.max_plugins_output_length)
File "/products/python/python2.7.6/lib/python2.7/site-packages/Shinken-2.4-py2.7.egg/shinken/action.py", line 185, in check_finished
self.kill__()
File "/products/python/python2.7.6/lib/python2.7/site-packages/Shinken-2.4-py2.7.egg/shinken/action.py", line 329, in kill__
os.killpg(self.process.pid, signal.SIGKILL)
OSError: [Errno 1] Operation not permitted

[1549457114] DEBUG: [Shinken] ========================