jupyter/nbconvert

Add support for merging/concatenating multiple notebooks

fperez opened this issue ยท 21 comments

This simple gist offers a command-line tool for concatenating/merging multiple notebooks. As requested by @jamespjh, this could be a useful nbconvert feature (it would also make it robust against evolution of the internal API for users, as they'd only have to remember the cmd line call, and we'd update the internals if the nbformat API changes).

I'm worried about the logic for merging metadata at notebook level, and why in many cases it is obvious what to do, I'm worried of the slippery slope we would get into when metadata differ.

I would simply make an explicit decision: the metadata is loaded so that it basically corresponds to that of the first nb in the list, plus keys from the others if they differ (the algorithm is simply to do meta.update() with all the notebooks in reverse order from the command line).

That's a simple, unambiguous choice with known semantics. If users don't like it, they can edit it back by hand later.

I don't see a problem with the feature having this constraint.

Ok, I like a strong limitation like that. I came almost to the same conclusion while walking back home.

It might be hard to shoehorn that into the nbconvert structure itself, as right now it's constructed around the assumption that 1 exporter convert 1 notebook, and the looping on all the notebook is implicit, but we can likely arrange that.

I propose to add a --merge flag that merge all the notebooks into one before feeding it to the rest of the pipeline. Metadata are as you proposed:

metadata = {}
for n in reversed(notebooks):
    metadata.update(n.metadata)

and for the name of the notebook (if needed) we use the first one.

This allow to not only merge, but merge (virtually) and generate a PDF/HTML version, at once.

+1

On Mon, Feb 22, 2016, 18:48 Matthias Bussonnier notifications@github.com
wrote:

I propose to add a --merge flag that merge all the notebooks into one
before feeding it to the rest of the pipeline. Metadata are as you proposed:

metadata = {}
for n in reversed(notebooks):
metadata.update(n.metadata)

and for the name of the notebook (if needed) we use the first one.

This allow to not only merge, but merge (virtually) and generate a
PDF/HTML version, at once.

โ€”
Reply to this email directly or view it on GitHub
#253 (comment).

This would be great. I'm using @fperez nbmerge.py script from https://gist.github.com/fperez/e2bbc0a208e82e450f69 at the moment, and would be delighted to replace it with simple invocation of nbconvert.

+1 here. Using nbmerge.py fairly frequently as well.

aoboy commented

I am trying to use fperez version and I am getting the following errors..
Traceback (most recent call last):
File "nbmerge.py", line 49, in
merge_notebooks(notebooks)
File "nbmerge.py", line 38, in merge_notebooks
print(nbformat.writes(merged))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 2519513: ordinal not in range(128)

Happen to be using Python 3, @aoboy? I think you're seeing this issue. If so, there's an easy fix mentioned in that thread.

aoboy commented

@npyoung I solved it.. using p2.7 actually.
I changed the line from print (nbformat.writes(merged)) to
print (nbformat.writes(merged).encode('utf-8'))
basically encoding is what was missing..

ketch commented

This capability would be very useful for a book I am currently working on, where each chapter is a Jupyter notebook. This feature would make it simpler to generate the print version.

@ketch have a look at @takluyver's BookBook

Hey guys, I've just created a repo (ipypublish), with a simple workflow/scripts for creating/editing 'publication ready' scientific reports from one or more Jupyter Notebooks (containing matplotlib, pandas, scipy, ...), without leaving the browser. Sorry for the spam but, since I used the gist posted here (thanks!), I thought it might be nice to share.

In particular it would be great to get any feedback, especially in the case where future Jupyter versions might break (or enhance) this. Since I intend to write my doctoral thesis with it!

Ta, Chris

@chrisjsewell Really cool project!! You might be interested in looking at Jupyter lab, it looks like your system is a beautiful application of the kind of workflow it makes possible & you will be able to influence the sevelopmebt of that interface to ensure that it can support true kinds of features you want going forward.

@mpacer thanks :) Yes I've seen a bit about it, looks good, I'll definitely be keeping tabs on it. I see you mentioning about easier manipulation of metadata (jupyterlab/jupyterlab#902), that's definitely relevant for my repo (chrisjsewell/ipypublish#1).

From the perspective of my research (atomic/quantum level simulations), I'm really interested in the interactive capability that javascript bridging is now offering for 3D graphics (ipywidgets, pythreejs, ipyvolume and my other repo pandas3js) and how it can be applied to the exploratory analysis -> publication workflow that Notebooks offer. Being out to 'pop' out a view of such a GUI to a separate window would definitely be pretty neat.

ketch commented

People interested in this thread may also be interested in this book project, which is a collection of notebooks viewable as PDF, HTML, or executable notebooks and runnable on binder or Microsoft Azure; it's not completely finished but is in an advanced state:

https://github.com/clawpack/riemann_book

We are using bookbook, among several other tools.

Although I have finished reading, I have not got the HowTo thing. And nbmerge.py failed...
๐Ÿ˜•

Since it hasn't been mentioned yet in this issue, let me suggest using https://nbsphinx.readthedocs.io/.

It basically concatenates notebooks and creates HTML pages or a LaTeX/PDF from them.

Just a note that this project sort-of exists now: https://github.com/jbn/nbmerge

(FWIW, I think it's better to have a separate tool than nbconvert do merging)

ipynb files are JSON format. What I do is open in a new python notebook all the files I want to merge, and convert them to dicts, then you can use the 'cells' key to concatenate all the cells or whatever you want to do, so finally you convert this dict or dicts back to JSON and export it to a new file.

Here is an example where I import 2 different ipynb files, and merge them into a new ipynb file:

import json
import numpy as np

first file

with open('file1.ipynb', 'r') as file:
json_1 = file.read()
dict_1 = json.loads(json_1)
cells_1 = dict_1['cells']

second file

with open('file2.ipynb', 'r') as file:
json_2 = file.read()
dict_2 = json.loads(json_2)
cells_2 = dict_2['cells']

New file (merging the first and second files)

new_dict = dict_1.copy()
new_dict['cells'] = list(np.concatenate([cells_1, cells_2]))
with open('new_file.ipynb', 'w') as json_file:
json.dump(new_dict, json_file)

Does loading a notebook loading as a module feature offer an answer for the discussed use case? https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Importing%20Notebooks.html