cassinyio/SwarmSpawner

start_timeout and http_timeout ignored

Opened this issue · 11 comments

I have set up c.SwarmSpawner.start_timeout=60 and c.SwarmSpawner.http_timeout=60 in jupyterhub_config.py, but it keeps timing out at 10 s. I could not see in the code that it overwrites the value from Spawner class. Is it not yet implemented?

I've noticed the same issue and it's pretty painful when trying to start a notebook on a separate worker node. I get a 500 error almost every time and reloading the hub via browser tends to be error prone (too many redirects, auth token expired, etc)

Does the remote worker have the image locally available?

@barrachri On mine, it does not when first launching. That's when I need a very large timeout as it takes a long time to download from my private repository from a separate datacenter. This problem compounds when considering I have multiple worker nodes each having to go through this initialization.

Setting c.Spawner.http_timeout = 60 * 5 also does not seem to work.

Would be good to also implement the start_timeout configuration.

I wonder if this is because we are going against JupyterHub 0.7.2 and it's not yet implemented until 0.8.0?

jupyterhub/dockerspawner#158

I don't know about http_timeout but start_timeout is implemented.
https://github.com/jupyterhub/jupyterhub/blob/c4ac1240aca5816aaca1ea2f2359fd1cd5ddf2ab/jupyterhub/user.py#L380

Can you try to spawn the notebook with pulling the image before?
Alias you run the notebook when you are sure the image is available.

I need to check but I remember Docker Swarm doesn't pull images.

I got authentication error when trying to get the image on the fly, so in my case the image is already there.

How I got this timeout error is by spawning a lot of notebooks at the same time (around 6-7). Some of them will take longer than 10 seconds to spawn, hence the timeout error.

What seems to be happening for me is that if the single-user notebook isn't already on the worker node, it takes a long time for it to pull that image and start up. That's expected. So it seems like when SwarmSpawner tries to start the service, the worker has to download the image but the browser client will get an error 500 after 20 seconds once logged in:

jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:37:34.012 JupyterHub app:724] Loading cookie_secret from /srv/jupyterhub/jupyterhub_cookie_secret
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:37:34.054 JupyterHub app:892] Not using whitelist. Any authenticated user will be allowed.
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:37:34.117 JupyterHub swarmspawner:244] Docker service 'jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1' is gone
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:37:34.118 JupyterHub swarmspawner:209] Docker service not found
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:37:34.119 JupyterHub app:1097] testuser not running.
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:37:34.128 JupyterHub app:1453] Hub API listening on http://0.0.0.0:8081/hub/
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:37:34.130 JupyterHub app:1176] Starting proxy @ http://0.0.0.0:443/
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | 14:37:34.283 - info: [ConfigProxy] Proxying https://0.0.0.0:443 to http://127.0.0.1:8081
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | 14:37:34.286 - info: [ConfigProxy] Proxy API at http://127.0.0.1:444/api/routes
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:37:34.336 JupyterHub app:1485] JupyterHub is now running at http://127.0.0.1:443/
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:38:22.471 JupyterHub swarmspawner:244] Docker service 'jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1' is gone
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:38:22.471 JupyterHub swarmspawner:209] Docker service not found
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:38:22.810 JupyterHub swarmspawner:271] user_options:
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:38:22.815 JupyterHub swarmspawner:244] Docker service 'jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1' is gone
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:38:22.829 JupyterHub swarmspawner:347] Created Docker service 'jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1' (id: u4waxtq) from image localhost:5000/dlcc/jupyter-singleuser:latest
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:38:42.863 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:38:42.870 JupyterHub web:1545] 500 POST /hub/login?next= (10.255.0.3): Spawner failed to start [status=1]
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [E 2017-08-23 14:38:42.896 JupyterHub log:99] {
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "X-Forwarded-For": "10.255.0.3",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Content-Length": "39",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Origin": "https://192.168.99.101",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Accept-Encoding": "gzip, deflate, br",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Upgrade-Insecure-Requests": "1",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "X-Forwarded-Proto": "https",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Referer": "https://192.168.99.101/hub/login",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Accept-Language": "en-US,en;q=0.8",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "X-Forwarded-Port": "443",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Cookie": "_xsrf=2|1de0481f|e61c73853bc6122e5b90d21389055b81|1503410500",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "X-Forwarded-Host": "192.168.99.101",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Host": "192.168.99.101",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Connection": "close",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Cache-Control": "max-age=0",
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |       "Content-Type": "application/x-www-form-urlencoded"
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    |     }
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [E 2017-08-23 14:38:42.897 JupyterHub log:100] 500 POST /hub/login?next= (@10.255.0.3) 20443.54ms
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:39:02.988 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:39:23.109 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:39:43.232 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:40:03.354 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:40:23.475 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:40:43.596 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:41:03.718 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:41:23.842 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:41:43.961 JupyterHub utils:88] Failed to connect to http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888/user/testuser ([Errno -2] Name or service not known)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [W 2017-08-23 14:42:34.354 JupyterHub orm:264] Adding missing route for testuser (<Server(jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888)>)
jupyterhub_jupyterhub.1.r85iflr13t5x@manager    | [I 2017-08-23 14:42:34.355 JupyterHub orm:188] Adding user testuser to proxy /user/testuser => http://jupyter-5d9c68c6c50ed3d02a2fcf54f63993b6-1:8888

Note that this is a run with the following configs set:

    c.Spawner.start_timeout = 300
    c.Spawner.http_timeout = 300

Those seem to at least make a difference in SwarmSpawner not dying after a few seconds of not getting the service up and running. But the browser client still ends up with:

500 : Internal Server Error
Spawner failed to start [status=1]

Expected behavior is for SwarmSpawner to report back that the service is starting.

I'm seeing the same occurrence as above.

Actually, it seems like after I set up http_timeout = 300, I will eventually get a 'Server never showed up at XXX', just only after the extended duration of time that I had set. It seems like my Hub is unable to communicate with the service it creates, because once I create it, the service appears to be running healthy.

I had similar issues that were somewhat improved by increasing slow_spawn_timeout. This comment suggests you can change it in the jupyter_config, though I just edited the jupyterhub source.

Let me post my config and logs In a moment.

Okay, so here are my logs and config file.

Jupyterhub output:

[I 2017-09-27 00:32:07.255 JupyterHub swarmspawner:239] Docker service 'jupyter-d37aebc5ce74fd9e4e744bdb9a9ba06b-1' is gone
[W 2017-09-27 00:32:07.255 JupyterHub swarmspawner:204] Docker service not found
[W 2017-09-27 00:32:07.567 JupyterHub swarmspawner:266] user_options:
[I 2017-09-27 00:32:07.572 JupyterHub swarmspawner:239] Docker service 'jupyter-d37aebc5ce74fd9e4e744bdb9a9ba06b-1' is gone
[I 2017-09-27 00:32:07.595 JupyterHub swarmspawner:337] Created Docker service 'jupyter-d37aebc5ce74fd9e4e744bdb9a9ba06b-1' (id: aebmhx7) from image jupyter/base-notebook:latest
[W 2017-09-27 00:32:17.585 JupyterHub base:347] User kel's server is slow to become responsive (timeout=10)
[I 2017-09-27 00:32:17.595 JupyterHub log:100] 302 POST /hub/login?next= (@127.0.0.1) 10424.27ms
[I 2017-09-27 00:32:17.595 JupyterHub login:91] User logged in: kel
[W 2017-09-27 00:32:47.910 JupyterHub user:278] kel's server never showed up at http://jupyter-d37aebc5ce74fd9e4e744bdb9a9ba06b-1:8888/user/kel after 30 seconds. Giving up
[I 2017-09-27 00:32:47.922 JupyterHub swarmspawner:368] Stopping and removing Docker service jupyter-d37aebc5ce74fd9e4e744bdb9a9ba06b-1 (id: aebmhx7)
[I 2017-09-27 00:32:47.957 JupyterHub swarmspawner:372] Docker service jupyter-d37aebc5ce74fd9e4e744bdb9a9ba06b-1 (id: aebmhx7) removed
[E 2017-09-27 00:32:48.052 JupyterHub gen:914] Exception in Future <tornado.concurrent.Future object at 0x7fb40584beb8> after timeout
Traceback (most recent call last):
File "/usr/lib64/python3.4/site-packages/tornado/gen.py", line 910, in error_callback
future.result()
File "/usr/lib/python3.4/site-packages/jupyterhub/user.py", line 294, in spawn
raise e
File "/usr/lib/python3.4/site-packages/jupyterhub/user.py", line 270, in spawn
yield self.server.wait_up(http=True, timeout=spawner.http_timeout)
File "/usr/lib/python3.4/site-packages/jupyterhub/orm.py", line 108, in wait_up
yield wait_for_http_server(self.url, timeout=timeout)
File "/usr/lib/python3.4/site-packages/jupyterhub/utils.py", line 94, in wait_for_http_server
**locals()
TimeoutError: Server at http://jupyter-d37aebc5ce74fd9e4e744bdb9a9ba06b-1:8888/user/kel didn't respond in 30 seconds

Here is my Jupyterhub Config. I'm relatively new to jupyterhub, and quite new to jupyterhub on swarm mode /w docker, so I may be missing something that may seem simple to others.

Configuration file for JupyterHub
import os
import subprocess
import os
import errno
import stat

c = get_config()
#pwd = os.path.dirname(file)

c.JupyterHub.ssl_cert = '/srv/jupyterhub/jupyter.hub.crt'
c.JupyterHub.ssl_key = '/srv/jupyterhub/jupyter.hub.key'

c.JupyterHub.ip = '0.0.0.0'
#c.JupyterHub.port = 443
c.JupyterHub.hub_ip = '0.0.0.0'
c.Authenticator.whitelist = {'kel','kellen'}
c.JupyterHub.spawner_class = 'cassinyspawner.SwarmSpawner'
c.JupyterHub.cleanup_servers = False
#c.SwarmSpawner.start_timeout = 300
c.SwarmSpawner.service_image = 'jupyter/base-notebook'
c.SwarmSpawner.jupyterhub_service_name = 'jupyterhub_jupyterhub'
#c.SwarmSpawner.service_prefix = "jupyterhub"
c.SwarmSpawner.networks = ["jupyterhub"]
#notebook_dir = os.environ.get('NOTEBOOK_DIR')
notebook_dir = '/home/jovyan/work'
c.SwarmSpawner.use_user_options = True
c.SwarmSpawner.notebook_dir = notebook_dir
c.SwarmSpawner.container_spec = {
'command' : ['start-singleuser.sh'],
'Image' :'jupyter/base-notebook:latest',
'mounts' : [{'type' : 'volume',
'source' : 'jupyterhub-user-{username}',
'target' : '/home/joyvan/work'}]
}