Please support drain scripts
Closed this issue · 5 comments
Is your feature request related to a problem? Please describe.
BOSH supports drain scripts, and I'd like to use one for some kubecf work (to dynamically create and remove application security group rules to support credhub).
Describe the solution you'd like
If a job script /var/vcap/jobs/…/bin/drain
exists, I'd like it to be executed on pod termination.
I don't have an opinion on if this should be triggered on SIGTERM
of container-run
, or as a preStop
hook on the container.
Describe alternatives you've considered
Attempting to make an extra job where the run
script triggers the drain
script manually on shutdown. (Draining is idempotent in this case, anyway.) This triggers correctly, but the pod goes away before the script can finish.
I don't see anything in the docs that looks like it describes a timeout for shut down. (It does reference hooks, but with the comment that it's not implemented.)
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/171870504
The labels on this github issue will be updated when the story is started.
@mook-as we have some tests for drain scripts (they should be working)
https://github.com/cloudfoundry-incubator/cf-operator/blob/3eafb4ed26e316fe32ea3b2e0d29afea756a4ff7/integration/lifecycle_test.go#L87
Do you have a sample or maybe a kubecf branch where you tried this out?
I'm currently working on mook-as/kubecf/credhub-sec-group-scf-helper + mook-as/scf-helper-release/kubecf/credhub-asgs — the relevant is credhub-setup.
I added a temporary BOSH property, credhub_setup.use_drain
, which if set to false
will use the mentioned workaround of manually triggering the drain
script on exit of the main run
script. Neither work; with the workaround in place, you can see in the credhub-setup
job (in either uaa
or credhub
group) the drain exiting half way. Without the workaround, that doesn't even get triggered. Either way, the expected behaviour on drain should be that the cf security-group
corresponding to the pod gets removed.
The most reliable way to test is probably:
- Deploy with
sizing.credhub.instances = 2
. - Wait for
cf security-groups
to show thecredhub-internal-kubecf-credhub-1
group get created. - Scale
credhub
down to1
. - Note that the logs don't show the drain being run (it should).
- Check that the
credhub-internal-kubecf-credhub-1
security group still exists (it shouldn't).
It is possible that I just have a bug somewhere in my code, but I at least expected some output from my code.
@mook-as I can't comment on your workaround. Regarding the existing drain script support in the operator, I think this is the relevant code segment: https://github.com/cloudfoundry-incubator/cf-operator/blob/fb39a29ad746849d8d6cc4177f54a8ee6357dfe8/pkg/bosh/bpmconverter/container_factory.go#L552
Apparently we support multiple drain scripts in a directory named 'drain'. Which would exlain why your script wasn't executed. Could you try to put your script(s) in /var/vcap/jobs/…/bin/drain/script1.sh
.