crossplane-contrib/provider-ansible

Provider-ansible is leaking zombies

Closed this issue · 0 comments

What happened?

Children processes of ansible-playbook (and, I imagine, other ansible processes) aren't reaped properly and end up as zombies:

# ps aux | grep ' [Z]'
2000       46125  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46126  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46127  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46128  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46130  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46131  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46132  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46133  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46141  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46142  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46143  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46144  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46146  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46147  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46148  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46149  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46151  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46152  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46153  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46154  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46203  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46204  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       46205  0.0  0.0      0     0 ?        Zs   21:07   0:00 [ssh] <defunct>
2000       46206  0.0  0.0      0     0 ?        Z    21:07   0:00 [ssh] <defunct>
2000       48634  0.0  0.0      0     0 ?        Zs   21:10   0:00 [ssh] <defunct>
2000       48635  0.0  0.0      0     0 ?        Z    21:10   0:00 [ssh] <defunct>
2000       48636  0.0  0.0      0     0 ?        Zs   21:10   0:00 [ssh] <defunct>
2000       48637  0.0  0.0      0     0 ?        Z    21:10   0:00 [ssh] <defunct>
2000       48638  0.0  0.0      0     0 ?        Zs   21:10   0:00 [ssh] <defunct>
...

As such, they keep accumulating until ulimit is hit, at which point provider-ansible stops functioning until the pod is deleted and recreated.

This must be due to the lack of an init process in the container and should be solvable by adding dumb-init or similar as the first process in the container, which will be cleaning up orphaned processes.

How can we reproduce it?

Create some ansible runs and watch ps aux | grep ' [Z]' in the pod or on the pod's node.

What environment did it happen in?

Crossplane version:
1.14.0