Provider-ansible is leaking zombies
Closed this issue · 0 comments
d-honeybadger commented
What happened?
Children processes of ansible-playbook
(and, I imagine, other ansible processes) aren't reaped properly and end up as zombies:
# ps aux | grep ' [Z]'
2000 46125 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46126 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46127 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46128 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46130 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46131 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46132 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46133 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46141 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46142 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46143 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46144 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46146 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46147 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46148 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46149 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46151 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46152 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46153 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46154 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46203 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46204 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 46205 0.0 0.0 0 0 ? Zs 21:07 0:00 [ssh] <defunct>
2000 46206 0.0 0.0 0 0 ? Z 21:07 0:00 [ssh] <defunct>
2000 48634 0.0 0.0 0 0 ? Zs 21:10 0:00 [ssh] <defunct>
2000 48635 0.0 0.0 0 0 ? Z 21:10 0:00 [ssh] <defunct>
2000 48636 0.0 0.0 0 0 ? Zs 21:10 0:00 [ssh] <defunct>
2000 48637 0.0 0.0 0 0 ? Z 21:10 0:00 [ssh] <defunct>
2000 48638 0.0 0.0 0 0 ? Zs 21:10 0:00 [ssh] <defunct>
...
As such, they keep accumulating until ulimit
is hit, at which point provider-ansible stops functioning until the pod is deleted and recreated.
This must be due to the lack of an init process in the container and should be solvable by adding dumb-init
or similar as the first process in the container, which will be cleaning up orphaned processes.
How can we reproduce it?
Create some ansible runs and watch ps aux | grep ' [Z]'
in the pod or on the pod's node.
What environment did it happen in?
Crossplane version:
1.14.0