Importing notebook from other notebook

Question

Importing notebook from other notebook

kornelc opened this issue 3 years ago · 5 comments

Hello,

I have a quick question regarding importing common code that sets up the environment for my notebooks:

The import $file. directive works great for .sc files, but is it possible to include a notebook from another notebook, kind of like what the %run <subnotebookfile.ipynb> magic does in the Python world.

Basically, I would like to be able to edit the included files easily through the Jupyter UI, just like I can do it with notebooks, and if its an .sc file, I cannot easily edit it through the web UI from what I can tell. Or is there a way to edit .sc file through the UI as well?

Answer 1 · 2022-01-19T03:10:38.000Z

Yes you can edit .sc files or any text file with either Jupyterlab or the classic Jupyter notebook UI.

Answer 2 · 2022-01-19T03:13:23.000Z

Or if you prefer to edit code in a notebook I would suggest just do that, then save it as a .sc file using nbconvert --to script then you can import it into another notebook. Straight up importing notebooks from another notebook is generally a bad idea.

Answer 3 · 2022-01-19T15:50:58.000Z

@kiendang Thanks for the quick response! I was able to edit the .sc files as you said, which was the main thing I was after. So my main problem is solved now!

Straight up importing notebooks from another notebook is generally a bad idea.

I'm curious, what problems do you see with importing notebooks from other notebooks. I'm asking because we have a big part of our infrastructure for data scientists implemented as Databricks Scala notebooks, which form a hierarchy, that seems to work well for us. So that is why I was looking to see if the same thing is doable with Almond.

Consider these three notebooks that implement overnight jobs:
notebook-to-import-data
import notebook-to-import-data-from-source1.
import notebook-to-import-data-from-source2.

Then you schedule the top notebook to run from chron overnight. Separate devs can work on each sub-notebook. Then in the morning after the run, you can look at each notebook output to see what actually happened. It seems it would be useful to implement functionality to import notebooks for this kind of scenario.

Almond has such a powerful API, especially for importing. Is there something I can do relatively easily, like write some extension code/plugin/other extra code, that would allow me to implement notebook import if it doesn't yet exist?

Answer 4 · 2022-01-20T03:32:33.000Z

Thanks for the quick response! I was able to edit the .sc files as you said, which was the main thing I was after. So my main problem is solved now!

Great!

I'm curious, what problems do you see with importing notebooks from other notebooks. I'm asking because we have a big part of our infrastructure for data scientists implemented as Databricks Scala notebooks, which form a hierarchy, that seems to work well for us. So that is why I was looking to see if the same thing is doable with Almond.

Oh sorry that was what I got from my personal experience (which resembles what's mentioned in this talk), by no means a criticism. Glad to learn about your use case where importing notebooks works.

Almond has such a powerful API, especially for importing. Is there something I can do relatively easily, like write some extension code/plugin/other extra code, that would allow me to implement notebook import if it doesn't yet exist?

I'm not sure myself since I'm unfamiliar with how importing notebooks was implemented in Python kernels.

Answer 5 · 2022-01-20T16:01:57.000Z

@kiendang Thanks for the video and I will share it with my team. :-) I agree with many of the points he makes, especially that complex code should be developed in IDEs through a rigorous build process with tests. What we do is that all our complex logic is developed as "normal" Scala code to build a very high level API for our code, compiled into a JAR which is included as a dependency in our notebooks, and then called from notebooks that way. So notebooks are only really used in prod to call very high level API calls and are often 5-10 commands long only.

What the %run magic does in Databrick's Scala notebooks is that it runs the other notebook's cells in order, but uses your calling notebook's state. So if you register a Spark table with a spark session in the called notebook, the calling notebook will see that table as if it registered the table itself. So it calls the notebook in the context of the parent notebook, basically.

I'm closing this issue since using .sc files as you described above, I can make things work.

Thanks for your help!