srstevenson/nb-clean

Filter cleans python version metadata

Closed this issue · 7 comments

The filter nb-clean add-filter --preserve-cell-metadata cleans the python version at the end of the notebook. This causes a metadata misalignment between local git and github notebooks.

- "pygments_lexer": "ipython3",
- "version": "3.8.8"

+ "pygments_lexer": "ipython3"

Every time that I open a notebook after pushing with the filter, I get my notebook modified. It is possible to fix that? Thanks in advance!

nb-clean removes the Python version from the global metadata by design. Do you have a specific use case where maintaining this metadata is required?

Every time that I open a notebook after pushing with the filter, I get my notebook modified

Can you share what commands you're running, and how you installed and configured nb-clean? When using the Git filter integration, notebooks are cleaned as they're added to the index prior to recording a commit, but the working copy shouldn't be modified (unless you've also run nb-clean clean yourself, outside the Git filter).

Hi! I activated the filter on a specific git repo with the following command: nb-clean add-filter --preserve-cell-metadata --remove-empty-cells. I would prefer that --preserve-cell-metadata does not touch at all the notebook's metadata, including the python version.

I introduced this library to my team that works fine individually but I've got some complaints about the forced commitments of some untouched notebooks.

Maybe the issue arises when some of my team does not use the library. So they push the dirty metadata notebook on GitHub and we end up cleaning every time their mess.

Then, --preserve-notebook-metadata is needed? @Nicolae93

nb-clean removes the Python version from the global metadata by design. Do you have a specific use case where maintaining this metadata is required?

I think this behavior is rather strange. nb-clean should either always clean all notebook metadata or not touch it at all. Why just cleaning the Python version and not the other?

In my case, VSCode add VSCode meta data. Also, the kernelspec are modified. It will cause git diff in environment with many developers.

Why just cleaning the Python version and not the other?

This was originally implemented because it's common that different contributors to a project will be using different versions of Python (perhaps just differing by patch version), which leads to spurious diffs when different contributors alternatingly commit a notebook. The other metadata fields are (or at least were at the time of nb-clean's inception) less likely to change with each contributor.

Why just cleaning the Python version and not the other?

This was originally implemented because it's common that different contributors to a project will be using different versions of Python (perhaps just differing by patch version), which leads to spurious diffs when different contributors alternatingly commit a notebook. The other metadata fields are (or at least were at the time of nb-clean's inception) less likely to change with each contributor.

Can we change the default behaviour? Because it is also common for different developer to use different notebook:

At least I propose below notebook supports:

Colab
VSCode
Datalore
Jupyter Notebook
Jupyter Lab

Opening, running, and saving (without editing) in all five notebook should produce same metadata.

This issue was closed due to inactivity. Please reopen if still relevant.