deepfates/memery

Document developer flow

Closed this issue · 4 comments

Developing with nbdev is really nice once you get the hang of it, but to most people it will be a hurdle to getting involved. I would like to make this as easy as possible, both by documenting the process clearly and by making it replicable in code.

If I use a notebook as a sort of developer dashboard, instead of a terminal where the commands are lost, I could both demonstrate and test code in the same place. And I could use it consistently to update the repos and packages, so that I don't miss a step or mess up my git repos and pypi packaging etc.

Realizing that this is a bigger obstacle than I thought. Apparently the instinctual thing to do, for Python hackers, is to edit the .py files directly. And the project organization suggests that the auto-generated memery folder is the source code, instead of the .ipynb files in the top directory. But if you hack on the generated files, you can no longer use the powerful literate programming tangler of nbdev, and the code won't merge with upstream.

Hmm. I'm sure that the nbdev project has already dealt with this problem, so I will research their documentation for clues.

I probably need to make the notebook-driven development really obvious, front and center. On GitHub, at least, as that's where people who want to develop on the project will arrive. So I need to rewrite the README -- which, to be clear, is generated from the index.ipynb file -- as a literate program that teaches both memery and nbdev. 😅

I also need to upgrade the documentation for each module. Right now they're mostly notes to self, so that I could come back and catch up with my previous work. But I've stepped away for a while and let it cool off in my mind, so now is the best time to edit for clarity and structure. Editing the documentation is more important than the code, at this point, because right now the biggest bottleneck to development is my limited free time.

Maybe the developer dashboard can be incorporated directly into the README? But maybe it's just a subset of those functions, pulled out for quick access. And a small set of environment variables that we can pass around for testing in the other notebooks. All of that can be built in the README, either way.

Yeah, this does require some developer discipline, I use nbdev too, for reference, here are some gotchas i've encountered when working with nbdev, especially with other people (writing this out also for myself to reuse in my own docs later, feel free to borrow / amend any of this):

nbdev works best when you make edits in notebooks and then export to .py files and don't edit the .py files. This exporting step can be a bit cumbersome sometimes - I often forget to run nbdev_build_lib in a separate terminal for example. Another annoying thing is having to run nbdev_clean_nbs to get rid of the metadata cruft from notebooks to make them nicer in github / git merges etc. The way found that worked is to use pre commit hooks, eg: https://github.com/igorbrigadir/nbdev-api-template/blob/master/.pre-commit-config.yaml

Making this integrated with jupyter is still an open issue AnswerDotAI/nbdev#321

To sync changes made in a .py script into the notebook, there is a command line tool: nbdev_update_lib the catch is, it has to contain the same number of cells as the notebook, eg, if a notebook has one cell with # export, this exported py file can only have 1 cell:

# AUTOGENERATED! DO NOT EDIT! File to edit: dev_test.ipynb (unless otherwise specified).

__all__ = ['foo']

# Cell

def foo(bar):
    pass

print("foobar")

so you can add other stuff and it will be in the same cell

# AUTOGENERATED! DO NOT EDIT! File to edit: dev_test.ipynb (unless otherwise specified).

__all__ = ['foo']

# Cell

def foo(bar):
    pass

def test():
    pass

print("foobar")

and then run

nbdev_update_lib

to sync the changes into the notebook, and then

nbdev_build_lib

again, to make sure __all__ has the new method signatures if any, and it will work.

Another really annoying thing that's difficult to debug, is when multiple notebooks export to the same file - avoid this at all costs. This makes everything unpredictable. Same thing kinda goes for exporting multiple py files from 1 notebook - also messy.

The default nbdev github actions will require some modifications - https://github.com/fastai/nbdev_template/blob/master/.github/workflows/main.yml by default they run all notebooks from start to finish, so if you have any side-effects like creating files etc, those will all happen every time the CI actions run. The best way to be selective about what notebooks to run in CI is to use something like a wildcard name instead of all of them, so that only a subset are run: nbdev_test_nbs --fname '*_test.ipynb'

I would highly recommend pinning a working version of nbdev and nbconvert and fastcore. In your requirements.txt for example:

nbdev==1.1.19
fastcore==1.3.21
nbconvert==5.6.1

I've already spent hours and hours fixing stuff i thought was my errors, but turned out to be a minor version of nvdev breaking everything. Yes, minor versions aren't "supposed to" break but they do, so it's way easier to just fix a working version and update explicitly.

Resolving merge conflicts in notebooks is a nightmare - so it helps to run nbdev_fix_merge on the broken ones sometimes, but sometimes you may have to dive in and edit the json yourself. In most cases it's easier to revert and merge them manually in a new notebook. Really though, it's best to avoid having to do this entirely.

There will come a point where the advantage of notebooks will be outweighed by the complexity of your code. At this point, it's ok to let go, and finally dump the notebooks entirely and keep the py files and maintain those as a larger library. The notebooks can stay and turn into test cases and documentation as opposed to core code.

Hope this helps you or someone else!

Now that we've moved to directly working on the .py files can we close this?

Yes, thank you @wkrettek