mwouts/jupytext

Jupyterlab adds jupytext metadata, but then `jupytext --sync` removes it again

Closed this issue · 9 comments

jli commented

Use case

My team's use case: we edit .ipynb notebooks in Jupyterlab. We use jupytext to create companion .py files for code review.

Historically, we've relied on everyone installing jupytext locally to update the .py files after notebook edits. Now, we'd like to add jupytext --sync via pre-commit to ensure the files are kept in sync.

Issue

  • With jupytext installed locally, edits to the notebook via Jupyterlab also automatically update the companion .py file. jupyterlab/jupytext also adds a jupytext.text_representation dict to the notebook's metadata.
  • When jupytext --sync runs, it removes metadata.jupytext.text_representation from the notebook file again.
  • This causes problems when running jupytext via pre-commit: it updates the .ipynb notebook file to remove the extra jupytext metadata and then fails because the index is outdated:
$ git commit -a -m 'updated notebook'
jupytext.................................................................Failed
- hook id: jupytext
- exit code: 2
- files were modified by this hook

[jupytext] Reading test_notebook.ipynb in format ipynb
[jupytext] Loading test_notebook.py
[jupytext] Updating test_notebook.ipynb
[jupytext] Error: the git index is outdated.
Please add the paired notebook with:
    git add test_notebook.ipynb
[jupytext] Updating the timestamp of test_notebook.py
[jupytext] Reading test_notebook.py in format py
[jupytext] Loading test_notebook.ipynb
[jupytext] Error: the git index is outdated.
Please add the paired notebook with:
    git add test_notebook.ipynb
[jupytext] Updating the timestamp of test_notebook.py

Rerunning git commit -a works, since we add jupytext's change to the git index. However, this isn't ideal because it's confusing, and it's annoying to have to do this every time we edit notebooks.

Details

I'm wondering why jupytext is adding this metadata in one context (editing via Jupyterlab) but removing it in another (called directly with jupytext --sync). I believe this is the code that removes the metadata:

jupytext_metadata.pop("text_representation", {})
(call stack: cli.main -> cli.jupytext_single_file -> lazy_write -> jupytext.writes)

It also seems that this only happens when using --sync. If I use --from ipynb --to py instead, then the original notebook isn't written, which avoids this issue. I'm using this as a workaround for now.

Versions

This is happening with recent versions:

$ python -V
Python 3.8.12
$ pip list | grep -i jup
jupyter-client       7.1.0
jupyter-core         4.9.1
jupyter-server       1.13.1
jupyterlab           3.2.5
jupyterlab-pygments  0.1.2
jupyterlab-server    2.10.2
jupytext             1.13.5

Hi @jli , thank you for reporting this. Well that is interesting! May I ask you to run this Python command?

from jupytext.config import find_jupytext_configuration_file
find_jupytext_configuration_file('.')

in the folder where you have the notebook? If any configuration file is found, can you share its content?

Note: It may happen that Jupytext CLI and Jupyter Lab are using different configurations, as Jupyter Lab cannot access files above the Jupyter root.

jli commented

@mwouts Thanks for the quick response! Yep, I do have a jupytext.toml config file. It looks like:

formats = "ipynb,py:percent"

(With further experimentation, I've found that without this config file, jupyterlab won't automatically create the companion .py file when I create new notebooks. But with the config, jupyterlab does automatically create the companion file.)

I've also put together a small demo repo for this issue: https://github.com/jli/jupytext_sync_900
It has jupytext.toml, requirements.txt, and (optional) .pre-commit-config.yaml files which I'm able to use to reproduce this issue, along with a README.md with specific reproduction steps.

Thanks!

Thank you @jli for the detailed information! I will make sure I can reproduce this, and then we will fix the issue. I'll keep you posted.

Hi again @jli , I am working on a test at https://github.com/mwouts/jupytext/compare/new_test_for_pre_commit_sync, but at the moment I have not been able yet to reproduce the issue.

The test looks like what you reported by maybe I have missed something. Do you think you could evolve that test into something that better reproduces the problematic flow? Thanks!

jli commented

I believe the difference between that test and my situation is that when I modify the notebook file, something is adding the jupytext.text_representation to the notebook metadata.

This is what's happening in my case:

  1. start: clean working tree, ipynb and py files in sync, no jupytext in notebook metadata.
  2. edit and save ipynb file with Jupyterlab + jupytext: both files are updated with the edit, but the notebook metadata also now has a jupytext.text_representation section.
  3. run jupytext --sync: invoking jupytext via the CLI removes the newly added jupytext metadata.

So in your test after updating the notebook, this assert should be true in order to match my situation:

    assert "text_representation" in tmpdir.join("test.ipynb").read()

I tried adding this, and the assert fails. So, the test isn't reproducing the part where editing the notebook adds the jupytext metadata section.

Thanks @jli for the additional information. I'll add steps 1. and 2. to the test and hopefully we'll reproduce this problematic jupytext.text_representation (which in theory belongs only to text files)

Indeed with the new version of the test at 8e2f3bb I can reproduce the issue that you documented above (thanks for pointing out at the exact pattern!)

        # modify the ipynb file in Jupyter
        # Reload the notebook
        nb = cm.get("test.ipynb")['content']
        nb.cells.append(new_markdown_cell("A new cell"))
        cm.save(dict(type="notebook", content=nb), "test.ipynb")
    
        # The text representation metadata is in the py file
        assert "text_representation" in tmpdir.join("test.py").read()
        # But not in the ipynb notebook
>       assert "text_representation" not in tmpdir.join("test.ipynb").read()
E       assert 'text_representation' not in ('{\n'\n ' "cells": [\n'\n '  {\n'\n '   "cell_type": "markdown",\n'\n '   "id": "cell-1",\n'\n '   "metadata": {},\n'\n '   "source": [\n'\n '    "A short notebook"\n'\n '   ]\n'\n '  },\n'\n '  {\n'\n '   "cell_type": "markdown",\n'\n '   "id": "cell-3",\n'\n '   "metadata": {},\n'\n '   "source": [\n'\n '    "A new cell"\n'\n '   ]\n'\n '  }\n'\n ' ],\n'\n ' "metadata": {\n'\n '  "jupytext": {\n'\n '   "text_representation": {\n'\n '    "extension": ".py",\n'\n '    "format_name": "percent",\n'\n '    "format_version": "1.3",\n'\n '    "jupytext_version": "1.13.5"\n'\n '   }\n'\n '  },\n'\n '  "kernelspec": {\n'\n '   "display_name": "Python 3",\n'\n '   "language": "python",\n'\n '   "name": "python_kernel"\n'\n '  }\n'\n ' },\n'\n ' "nbformat": 4,\n'\n ' "nbformat_minor": 5\n'\n '}\n')
E         'text_representation' is contained here:

Hi @jli , this should be fixed in the new version 1.13.6 that is being published on pypi. Would you mind giving it a try and report if the problem disappears? Thanks

jli commented

@mwouts I confirmed that with 1.13.6, the problem goes away: now edits in jupyter-lab no longer add the jupytext metadata.

Thanks a lot!