alphagov/paas-cf-conduit

Feature request: Kill the app if no CF tunnel is present

keymon opened this issue · 4 comments

In some circumstances the conduit proxy app might remain running forever.

For instance, if the cf conduit fails, or if the cf token expires or the user targets a different API endpoint or org or space.

If the application is running forever, it might happen that the user is billied for ever.

Proposed solution

One solution can run some kind of watchdog command in the app, that would fail if no SSH tunnel is created or if it stops. For example, dropping a safe-terminate.sh

#!/bin/bash

ssh_running() {
  netstat -t | grep -q ":2222"
}

start_timeout=$(date -d "+1 min" +%s)
max_timeout=$(date -d "+7 days" +%s)

while ! ssh_running && [ "$(date +%s)" -lt "${start_timeout}" ] ; do
  echo "Waiting for SSH session..."
  sleep 5
done

while ssh_running && [ "$(date +%s)" -lt "${max_timeout}" ] ; do
  echo "SSH session still running..."
  sleep 600
done

echo "No more SSH sessions or timeout detected. Exitting in 10 seconds."
sleep 10   

then, cf conduit would push the static app as follows:

cf push -b staticfile_buildpack -m 64m -k 64m -i 1 --health-check-type none --no-route --no-manifest __conduit_12345__  -c 'bash $HOME/public/safe-terminate.sh'

The app would wait for a SSH to be present in 1 minute, and then wait until it is gone. Then it will terminate after 10 seconds. In normal operation, the flow would be as usual, the application would be deleted once conduit finished.

If instead the cf conduit fails to delete the app, the SSH tunnel would eventually die. The script would detect that and terminate. CF would restart the application several times, but the script would fail after one minute because no SSH connection is done.

alext commented

I wonder whether we could push this as an app with 0 instances, and run it in a task to get around CF restarting it repeatedly... I don't know whether you can ssh into a task.

I don't know whether you can ssh into a task.

I don't believe one can:

  • some tests I've just done failed to get into a sleep 1000 task's container
  • the only way of referencing a specific container is via --app-instance-index, which bombs if it's not an Int, and which is (coarsely) not an Int for a task (versus an AI)
  • a task running ps axu doesn't show the diego-specific sshd which I believe is spawned in ssh-enabled AIs on container instantiation, not on cf ssh invocation

The only way I can see tasks being useful here is to embed the user's oauth-token in a task, and make it responsible for bringing up the conduit app and tearing it down afterwards, and then self-terminating. All doable, but a non-trivial re-work of how the system works at present.

I tried to do that (ssh into a task) but you cannot do it. The documentation actually says you cannot ssh into a task container.

https://docs.cloudfoundry.org/devguide/using-tasks.html

Note: You cannot SSH into the container running a task.

That is why my hacky solution. :)

Would be great to find a solution. Just saying.

name                   requested state   instances   memory   disk   urls
__conduit_ukntjyh5__   started           1/1         64M      256M
__conduit_6iceu7gn__   started           1/1         64M      256M
__conduit_s4ho3eed__   started           1/1         64M      256M
__conduit_cazlalzq__   started           1/1         64M      256M
__conduit_tkteitkd__   started           1/1         64M      256M
__conduit_yyhaur4n__   stopped           0/1         64M      256M
__conduit_rgews4a3__   started           1/1         64M      256M
__conduit_mduziycu__   started           1/1         64M      256M
__conduit_bfpi36jp__   started           1/1         64M      256M
__conduit_jhcdwi0q__   stopped           0/1         64M      256M
__conduit_96wbydvf__   started           1/1         64M      256M
__conduit_uv2z4rzf__   started           1/1         64M      256M
__conduit_vlrjhr10__   started           1/1         64M      256M
__conduit_h53lj7ql__   started           1/1         64M      256M
__conduit_46gemlny__   started           1/1         64M      256M
__conduit_8vvs22ty__   started           1/1         64M      256M
__conduit_jfclpm9c__   started           1/1         64M      256M
__conduit_byhx5rzh__   stopped           0/1         64M      256M
__conduit_cnc6ibuc__   started           1/1         64M      256M
__conduit_ci9nv6ub__   started           1/1         64M      256M
__conduit_drvjrmdw__   started           1/1         64M      256M
__conduit_9u02ohzj__   started           1/1         64M      256M
__conduit_85sv14rf__   started           1/1         64M      256M
__conduit_brm9pm5o__   started           1/1         64M      256M
__conduit_q50m22cl__   started           1/1         64M      256M
__conduit_43w232yv__   started           1/1         64M      256M