equinor/webviz-config

Gunicorn worker reboot provokes callback crash while serializing figures

Opened this issue · 1 comments

With sufficiently heavy callback structures, frequent rebooting of gunicorn workers provokes a crash that sends the user back to the entry page. The crash occurs in part because of how the newly added orjson support is implemented within dash.

How to reproduce
I have made a minimal plugin that reproduces the error consistently on my end:

# contents of ./plugins/orjson_crash.py
from uuid import uuid4

from dash import html, dcc
from dash.dependencies import Output, Input
from webviz_config import WebvizPluginABC

import plotly.graph_objects as go


class OrjsonCrash(WebvizPluginABC):
    def __init__(self, app):
        super().__init__()
        n_comps = 20
        self.figids = [str(uuid4()) for _ in range(n_comps)]
        self.clids = [str(uuid4()) for _ in range(n_comps)]

        for figid, clid in zip(self.figids, self.clids):
            @app.callback(
                Output(figid, "figure"),
                Input(clid, "value")
                )
            def plot(val):
                figure = go.Figure(
                    data=[go.Bar(x=[1, 2, 3], y=[1, 3, 2])],
                    layout=go.Layout(
                        title=go.layout.Title(text="A Figure Specified By A Graph Object")
                    )
                )
                return figure
            

    @property
    def layout(self):
        return html.Div(
            id="main-title",
            children=[
                dcc.Graph(id=figid) for figid in self.figids
             ]
             + [dcc.Checklist(id=clid) for clid in self.clids]
        )
#orjson_break.yaml
title: Orjson Crash

pages:
  - title:  Orjson Crash 1
    content:
      - OrjsonCrash:

  - title:  Orjson Crash 2
    content:
      - OrjsonCrash:

(setup.py and requirements.txt elided for brevity).

Install the project and build a portable webviz app
webviz build orjson_break.yaml --portable tmp
then run with the default gunicorn settings set by webviz (c.f. the dockerfile in the freshly created tmp folder):

gunicorn --bind 0.0.0.0:5000 \
               --keep-alive 120 \
               --max-requests 40 \
               --preload \
               --workers 2 \
               --worker-class gthread \
               --threads 2 \
               --timeout 1000000 \
               "tmp.webviz_app:server"

Open the bound url, then alternate clicking between pages in the app. After a few times clicking back and forth, the crash should appear (if it doesn't, try increasing the number of callbacks by changing n_comps). To make it easier to see that the crash has occurred, keep the developer console open while clicking. On the serverside, you should find the traceback:

ERROR:tmp.webviz_app:Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "/home/mhwa/.cache/pypoetry/virtualenvs/temp-BbzQGPLE-py3.8/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/mhwa/.cache/pypoetry/virtualenvs/temp-BbzQGPLE-py3.8/lib/python3.8/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/mhwa/.cache/pypoetry/virtualenvs/temp-BbzQGPLE-py3.8/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/mhwa/.cache/pypoetry/virtualenvs/temp-BbzQGPLE-py3.8/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/mhwa/.cache/pypoetry/virtualenvs/temp-BbzQGPLE-py3.8/lib/python3.8/site-packages/dash/dash.py", line 1336, in dispatch
    response.set_data(func(*args, outputs_list=outputs_list))
  File "/home/mhwa/.cache/pypoetry/virtualenvs/temp-BbzQGPLE-py3.8/lib/python3.8/site-packages/dash/_callback.py", line 191, in add_context
    jsonResponse = to_json(response)
  File "/home/mhwa/.cache/pypoetry/virtualenvs/temp-BbzQGPLE-py3.8/lib/python3.8/site-packages/dash/_utils.py", line 21, in to_json
    return to_json_plotly(value)
  File "/home/mhwa/.cache/pypoetry/virtualenvs/temp-BbzQGPLE-py3.8/lib/python3.8/site-packages/plotly/io/_json.py", line 127, in to_json_plotly
    opts = orjson.OPT_SORT_KEYS | orjson.OPT_SERIALIZE_NUMPY

Expected behavior
No crashing.

Screenshots
image

Current workarounds
Increasing the amount of requests before gunicorn reboots workers alleviates the problem (as does just not rebooting workers). Adding a dummy import orjson to any __init__.py appears to 'solve' the problem for our purposes.

Additional context
It feels like this is an error within dash, if I am being honest, but it might be useful to add a workaround to webviz itself.

This is related to plotly/plotly.py#2433 so I have added a reproducing example there as well