dnephin/dobi

Job `command`s that are too short cause errors

aidan-mundy opened this issue · 1 comments

Job commands that cause the container to exit too quickly cause errors like this one:

[ERROR] signal=window changed error=API error (409): Container b404f88469de81b7d44d8393dc086866277a25e6bc5d0c6928e7f6dce1edba1a is not running [job: populate] sh -c 'cp /source/datafile /data; touch /data/newfile' Failed to set container's TTY window size

This causes the init-named-volume example/integration test to fail because the copy and touch commands run near-instantly. (This might not be true actually, my test failure may be coming from a different cause, but the ERROR message still appears.) Brief testing shows that the minimum stable command length on my machine is 10 ms.

If a solution to this error cannot be found, a sleep 0.1s should be added to the populate job's command in init-named-volume.

After digging a bit deeper into the code, this error is caused by the initial SIGWINCH signal sent immediately after starting the container when running runContainer(). The container must be exiting before the signal is fully handled (further evidenced by the error message associated with docker API error 409: "Can not reuse socket after connection was closed."). The error does not cause the job to fail, it only appears to print.

My suggestion here is to change the Error message to a Warning. I will put a PR up for this change.