Prevent server from getting stopped during long simulations
JordiBolibar opened this issue · 12 comments
So far I am able to run long simulations as long as I keep my session more or less active (i.e. during my working hours). However, as soon as I leave it running during the night, my server is eventually stopped and the simulations get interrupted. I have configured my SSH file .ssh/ssh_config
with keep alive as follows:
TCPKeepAlive yes
ClientAliveInterval 30
ClientAliveCountMax 240
X11Forwarding yes
X11UseLocalhost no
Is there anything else I should do in order to avoid that? What am I missing? Thanks in advance!
Ah thanks for reporting this @JordiBolibar, this is a novel situation for me. The automated logic to terminate an "inactive" server is quite intricate, but it won't check for open SSH connections as a sign of activity.
- I've opened an issue about this in the project that has enable us to have ssh connections, yuvipanda/jupyterhub-ssh, to our jupyter servers so we could get potential help solving this more thoroughly.
- We need to have a workaround for now.
I'm quite confident you can accomplish a workaround if you visit your server via the web UI (hub.jupytearth.org), start any notebook, and keep it running like...import time # to avoid a mistake keeping a server running for months on end # adding costs to the cloud bill, avoid putting in 999999999999 etc. time.sleep(3600*24)
@JordiBolibar try the workaround for now and let me know if it works for now. I'm 99% confident it will work to block having the server stopped automatically.
Hi Erik, thanks for the quick reply. I am in fact connecting through the web UI. I forgot I'm not longer attempting to connect from VSCode, since now there's the VSCode integrated in the browser. So indeed, the SSH configuration I posted is useless.
And thanks for the workaround. I will implement the same thing in Julia and see if that does the trick.
Oh hmmm
- Do you think the bug I reported in yuvipanda/jupyterhub-ssh#67 is correct or incorrect?
- Is it correct that you are using visual studio code via the web interface (hub.jupytearth.org)?
- Can you clarify what "running a simulation" implies? Are you running something from a terminal opened in the hub.jupytearth.org based visual studio code UI?
And thanks for the workaround. I will implement the same thing in Julia and see if that does the trick.
The goal is to have a server not be stopped, to do that, you need to be seen as active. For that, this workaround relies on having a "jupyter kernel" running at all time. What's important isn't that you run it in julia or python, but it's actually a registered kernel running. In practice, you can visit https://hub.jupytearth.org/hub/user-redirect/lab and start a python notebook and associated kernel running this sleep command, and then go to https://hub.jupytearth.org/hub/user-redirect/vscode and keep working.
I think you can the running kernels from a terminal with jupyter server list
.
- It depends if code-server is connecting via SSH or not. I'm not sure if it needs to do so or if it is run in the same server.
- Yes, I'm using VSCode via code-server directly in the browser, the one present in the launcher.
- Yes, I'm running a Julia simulation from the VSCode browser version from the hub UI.
OK, I will stick with the JupyterNotebook trick for now.
I want to come back to this issue that @JordiBolibar opened some time ago.
I had been using the time.sleep()
trick for some time now, and I notice that without further notice the kernel of that notebook also dies without giving news before the established time. I wonder if there is a more stable solution in order to keep a server running for several hours, especially for long and computationally expensive simulations. @consideRatio do you have any idea of how to accomplish this? Running the sleep command from a terminal will have the same effect?
Thank you!
I think the way to do this is to:
- Write a jupyter server extension that has UI that says 'keep this server alive'
- When it is enabled, it'll keep reporting that it is active to the server's API
- This will ensure that the multiple killers we have (idle and in-server) don't get to it.
So users can go to this page, and say 'keep alive for 8h, 24h, until turned off' etc
@minrk's work on https://github.com/minrk/jupyter-keepalive address this quite well I think.
keepalive.mov
With #155 I'll add it to the base image.
Hi @consideRatio, I'm still having issues with this. I cannot find the Keep server alive option, and running the notebook with a sleep command still doesn't work. I cannot run long simulations since my server gets disconnected after a short while.
Is it normal that I cannot access the option you displayed above in the video? Thanks a lot in advance!
Ah, I ended up disabling it when trying to resolve a very challengeng upgrade of other packages with coupled dependencies.
jupyter-earth/hub.jupytearth.org-image/Dockerfile
Lines 415 to 422 in 09fe9f2
I'll see if I can upstream a resolution to this by getting the package build and published so that nodejs isn't required.
I opened minrk/jupyter-keepalive#4 @JordiBolibar.
I'll see if I can re-configure something to help you avoid getting shut down.
Awesome, thanks a lot for your help!
@JordiBolibar I've not dropped the ball on this, but I'm swamped with work items. There is progress to getting jupyter-keepalive to help us here, so I'm currently aiming for that as a resolution.