hynek/doc2dash

automatically import nbviewer

bendichter opened this issue · 3 comments

It has become common for python docs to reference jupyter notebooks with external links to http://nbviewer.ipython.org./. (e.g. http://nbviewer.jupyter.org/github/cvxgrp/cvx_short_course/blob/master/applications/portfolio_optimization.ipynb) I have found that these notebooks are often the most useful portion of the docs, however they are not automatically captured by doc2dash. I propose a feature that automatically downloads these notebooks.

I've implemented this with the following python script. I've only tested it on cvxpy.

import os
from glob import glob
import re
import urllib.request
from requests import get
import bs4 as soup
from tqdm import tqdm

html_dir = '.../cvxpy/doc/build/html'

nbv_addresses = []
names = []
for filename in glob(os.path.join(html_dir, '**/*.html'), recursive=True):
    with open(filename, 'r') as content_file:
        content = content_file.read()
    nbv_inds = [m.start() for m in re.finditer('http://nbviewer.ipython.org', content)]
    
    content_out = content
    if nbv_inds:
        for nbv_ind in tqdm(nbv_inds, desc='downloading and converting notebooks from ' + filename):
            nbv_address = content[nbv_ind:content.find('"', nbv_ind)]

            dest = os.path.split(filename)[0]
            name = nbv_address[nbv_address.rfind('/') + 1:]
            nb_fname = name.replace('.ipynb','.html')

            # download notebook
            dl_address = nbv_address.replace('nbviewer.ipython.org/github', 'raw.githubusercontent.com')
            dl_address = dl_address.replace('blob/','')
            response = get(dl_address)

            # write ipnb file
            nb_fullpath = os.path.join(dest, name)
            with open(nb_fullpath, "wb") as file:
                file.write(response.content)

            #convert notebook
            os.system('jupyter nbconvert --to html -y --output-dir ' + dest + ' ' + nb_fullpath)
            os.remove(nb_fullpath)

            content_out = content_out.replace(nbv_address, nb_fname)
        
        # write file with new paths
        with open(filename, 'w') as content_file:
            content_file.write(content_out)
hynek commented

I’m sorry I’ve left you hanging for checks calendar 7 years. I regularly checked the issue, didn’t quite understand what it means and swore to myself to check again soon. I still don’t quite understand what it’s about but I also don’t think you’re still interested in pursuing this. Sorry.

TBH I don't know what this is about anymore either 😂