Analysis Framework
Design
The first idea was to have only a series of notebooks which are providing an exhaustive description of an analysis.
PB: Reproducibility / Claritity interface.
- How to integrate containers in jupyter ?
> Should we have a specific homemade container per notebook
>> This hinder part of the interest of containers being “atomical”
> Only use conda environments (ie one conda environments per notebooks)
>>> Maybe can be a structure could be, one singularity container with all the conda environments of the analysis.
- How to deal with computing ressources demanding parts of the analysis: in notebook ? as snakemake rules ?
- Structure:
Maybe each analysis notebooks could be separeted in a:
- prior: All requirements needed for the analysis
- main: The actual analysis
- Most ressources demanding computations > In snakemake only ? > easyer when migrating to cluster.
- Testing:
I see 2 ways of testing:
- Using classical pytest with against output files created by a particular notebook
- Using papermill and scrapbook from nteract (this look more cumbersome, ie have to add code to the notebook cells we want to keep the output)
- Another option would be to perform test on the jupytext files (not the notebooks) ?
Dev
Premises [2/2] :0.1:
- [X] Setup a snakemake file launching a jupyter notebooks with papermill using a configuration file
- [X] Define an primary organisation for the jupyter notebooks:
- 1 for summary
- 1 for data downloads
- 1 for general report/figures/tables
Add html export :0.2:
Add singularity support [2/2] :0.3:
- [X] Add container functionality (singularity) in a notebook
- [X] Add container functionality in a snakemake rule
> Still fighting with install of singularity new version (install process looks ugly and dependending on .bashrc !)
> Fixed with AUR, check ~/org/my_manual.org
Use CookieCutter and add first documentation :0.3:
Using CookieCutter and Sphinx
Add preliminary tests :0.3.1:
- Adding notebook test: can get inspiration from https://github.com/diana-hep/pyhf/blob/master/tests/test_notebooks.py or https://discourse.jupyter.org/t/testing-notebooks/701
- [X] simple test using init_repnots script
- [X] More advanced notebook tests using papermill and scrapbook
Implement jupytext light script as templates :0.3.2:
Centralize papermill rule in snakemake file :0.3.3:
- Actually, the rules added using jinja templates could be replaced by one
“papermill rule” which would run papermill on each notebook. Check how to
refactor that.
> Now implemented in advanced_snakefile in templates
> But maybe better in case we want to trace the input/outputs of each notebooks through snakemake later on.
>> So let both option in form of a basic_ and advanced_snakemake file.
NEXT
- Create an index/webpage linkin all html ( a jupyter notebook ? a django server ?)
- Integrate voila server ?