jupyter/notebook

Bug : Need to run plt.plot() twice from a freshly started notebook to see the plot

sebma opened this issue · 12 comments

sebma commented

Hi,

Steps to reproduce :

  1. Save, then shutdown current notebook (using the "Running" tab)
  2. Click on the ".ipynb" file to start the notebook OR Create a new notebook
  3. Run this :
In [1]: import matplotlib.pyplot as plt
        plt.plot([1,2,3,4])
Out[1]: [<matplotlib.lines.Line2D at 0x111a13b00>]

In [2]: plt.plot([1,2,3,4])
Out[2]: [<matplotlib.lines.Line2D at 0x111a13b00>]

image

Can you please have a look ?

I think the issue is with doing it in the same cell where you import matplotlib - there's some setup that only happens after that cell has run.

Does this mean that this is the expected notebook behavior? Or is this actually a bug? I can confirm that this is an issue for me, and I can also confirm that splitting out the import into a separate cell does work. (It would be preferable if this wasn't happening for a short course that I'm teaching soon-ish, which is why I ask.)

sebma commented

@takluyver Hi, I added the import matplotlib.pyplot as plt import at the end of my $HOME/.jupyter/jupyter_notebook_config.py as a workaround :

try :
    import matplotlib as mpl
    import matplotlib.pyplot as plt
except :
    pass

Once I restart the jupyter-notebook process, then restart my new ipynb notebook file and run the first cell containing plt.plot([1,2,3,4]), it says :

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-61e052f4bd52> in <module>()
----> 1 plt.plot([1,2,3,4])

NameError: name 'plt' is not defined

@rnelsonchem it's somewhere inbetween. I think that if someone can figure out a way to fix it that doesn't break other things, that would probably be welcome.

@sebma config files don't go into your interactive namespace. Try a startup file instead.

sebma commented

@takluyver Thanks, for the tip !

I'm a little confused, why are the config files shared across ~/.jupyter and ~/.ipython subdirectories ?

Is this because IPython is the Jupyter kernel for Python language ?

Yep. IPython deals with running your Python code, Jupyter provides the notebook interface, including things like saving and loading notebook files.

So there's an extra reason that putting it in jupyter_notebook_config.py won't work - that code is not even run in your kernel. But even putting it in an IPython config file wouldn't do what you want. You specifically need an IPython startup file.

sebma commented

@takluyver Thanks :-)

Try using “%matplotlib inline”.
according to IPython documentation:
...
%matplotlib inline
With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document.

There are some discussions on %matplotlib inline:
https://stackoverflow.com/questions/43027980/purpose-of-matplotlib-inline

@takluyver agree the fault may pertain to the cell where the matplotlib import occurs.

This particularly affects libraries that lazily import matplotlib, such as xarray:

In [1]: import xarray, numpy as np
In [2]: x = xarray.DataArray(np.random.random((5,5)))
In [3]: x.plot() # does not work
In [4]: x.plot() # does work

I think this behaviour is counter-intuitive and undesirable.

How exactly does the notebook change when matplotlib is imported?

That's unfortunate. I can't really remember the details of what goes on here, but it involves this code in ipykernel and this code in IPython.

Certainly if you fall into the configure_once branch, you'll have trouble - that defers setting up the integration until after the cell has finished running, which will be too late for plots generated in that cell.

There's a comment there saying that that branch is only needed for Python 2. But it's possible that other changes since that was written mean that branch is used on Python 3 as well.

If you inspect <globals>.get_ipython().events.callbacks you can watch some state changes occurring:

Within one cell, the line importing matplotlib.pyplot immediately causes matplotlib.pyplot.install_repl_displayhook.<locals>.post_execute() and ipykernel.pylab.backend_inline._enable_matplotlib_integration.<locals>.configure_once(ExecutionResult) to be scheduled at the conclusion of the cell.

For all subsequent cells, the latter callback is replaced by ipykernel.pylab.backend_inline.flush_figures().

(Seems post_execute() rasterises the current figure, then flush_figures() displays that image.)


When ipykernel.pylab.backend_inline is being imported (by matplotlib.pyplot), it runs _enable_matplotlib_integration, which tries to set things up* (and critically, to register the flush_figures() callback). If that setup fails unexpectedly, it is postponed (by registering configure_once which will attempt the same setup again then immediately unregister itself).

It seems that the underlying bug relates to circular imports (causing that setup to initially fail). The configure_once mechanism is a work-around, to delay execution of setup until after the modules have finished being imported. This work-around is flawed, preventing the first cell from working (because the setup won't register flush_figures() until just after the callback list has been already executed for the cell).


Note that everything works if the imports occur in a different order:

In [1]: import ipykernel.pylab.backend_inline # import this first
        import matplotlib.pyplot as plt
        plt.plot([1,3,2]) # this works

*The setup mentioned earlier involves executing a couple functions in IPython.core.pylabtools which in turn try to import and use components of matplotlib.pyplot and ipykernel.pylab.backend_inline. The complication is that (depending on the order of the top level imports) one or both of those modules will itself be only partially imported at this stage.

Normally, ipykernel.pylab.backend_inline is imported during import of matplotlib.pyplot (because the ipython kernel has set an environment variable to configure the default backend for matplotlib to be so). Until this finishes, python does not yet assign the pyplot attribute on matplotlib, and thus, subroutines trying to reference pyplot (by parent attribute) raise exceptions. The code may be more resilient agaist circular imports if IPython.core.pylabtools.activate_matplotlib used the from matplotlib import pyplot as plt syntax (and later python versions may also improve the situtation):

In [1]: import inspect, IPython.core.pylabtools
        exec(inspect.getsource(IPython.core.pylabtools.activate_matplotlib)
                .replace('import matplotlib.pyplot', 'from matplotlib import pyplot')
                .replace('matplotlib.pyplot', 'pyplot'), 
             IPython.core.pylabtools.__dict__) # apply patch

        import matplotlib.pyplot as plt
        plt.plot([1,3,2]) # this works also, in python 3.6

So there are a several different ways of fixing our problem. Ideally, the imports should be fixed so the workaround isn't needed (ideally by eliminating circular imports and/or by not executing functions during import, but possibly just by using more resilient import syntaxes). Alternatively, flush_figures could be registered earlier (by _enable_matplotlib_integration), or be invoked invoked directly from configure_once, or the order of callback evaluation be changed so that flush_figures doesn't miss its turn (say by appending flush_figures to the same callback list as configure_once, or by executing the post_run_cell list prior to the post_execute list).

In [1]: import matplotlib.pyplot as plt
        get_ipython().events.trigger('post_run_cell') # call configure_once() earlier
        plt.plot([1,3,2]) # also works

I'll see about putting together a PR or two..

@benjimin's fix in IPython >= 7.10.0 appears to resolve the issue - so I'm closing it. Thanks @benjimin!