underworldcode/stripy

[REVIEW] Notebooks directory inside stripy source code

santisoler opened this issue · 13 comments

Part of openjournals/joss-reviews#1410

Leaving the source code of the package and the notebooks that show examples of its applications separated is a good practice in order to keep the repository better organized.
I found that the stripy related notebooks are located inside /stripy-src/stripy/Notebooks and linked on the /stripy/Notebooks directory.
I think that moving the notebooks to the latter path would tidy up the repo a little bit, making easier to keep source code separated from documentation/examples.

Do you agree? Have you included the notebooks on that location for some special reason?

The original reason was to be able to bundle them standalone with the pypi installation and install notebooks after a pip install. We can probably move those to a different location but we usually do not have a separate repository for this level of documentation. I expect we can engineer for setup.py to reach up and out to get the notebooks from elsewhere in the repo.

Oh ok! I didn't notice that was the reason for the current location of the notebooks.
So, in light of this new reason let me rethink the issue.

The current location is not harmful, so we could consider it as a possible solution.
I don't quite like to force the setup.py to find files outside its directory. You'll need to hack it and create a non very elegant solution.

I can see two more options here:

  1. There no special reason why Notebooks should be released under pypi. They could be downloaded from stripy once it's already installed. For example, you could create a function that downloads the notebooks from your repo:

    import stripy
    
    stripy.download_notebooks()

    On @fatiando we have a very nice tool for fetching files from the web called pooch. It's very friendly when the files are located on a GitHub repo: you could specify what version of the file you want.

  2. An easier solution would be to refactor the organization of the repo. Instead of having all the source code related files inside stripy-src, you could move all those files to the root. So the repo would look something like this:

    ├── binder
    ├── Docker
    ├── litho1pt0-src
    ├── Notebooks
    ├── paper
    ├── stripy
    │   ├── f77-src
    │   ├── src
    │   ├── __init__.py
    │   ├── cartesian_meshes.py
    │   ├── cartesian.py
    │   ├── documentation.py
    │   ├── spherical_meshes.py
    │   └── spherical.py
    ├── tests
    ├── COPYING
    ├── COPYING.LESSER
    ├── LICENCE
    ├── MANIFEST.in
    ├── README.md
    └── setup.py
    

    With this organization, the Notebooks directory is located on the same level as setup.py, so you could easily add them as package_data to pack the notebooks on pypi.
    Moreover, with this organization you would enable the installation of the latest version of the repo with pip (as I mentioned on #17):

    pip install https://github.com/underworldcode/stripy/archive/master.zip
    

Both solutions are not mutually excluding, so you could apply both simultaneously if you consider it convenient.
Let me know if you like any of them or if you could think a third possible scenario.

As usual, I'm willing to help you implementing any solution you decide to apply.

I personally prefer your second proposal. I think the original motivation for keeping everything tucked away in stripy-src was to de-clutter the root directory, but in hindsight that has created unforeseen issues where license files have not been detected and symbolic links to README.md were required. If we are moving litho1pt0 source to another repository (#26) I see little need to store stripy source code in a subdirectory. @lmoresi what do you think?

Great! Because this refactor could involve major changes, I would recommend creating a PR to do it.
That way we can test if everything is working as expected before merging it.

I don’t think litho1pt0 deserves a publication as it is really just a “thin wrapper” which is discouraged by joss. We can give it a zenodo presence, though.

Ok. I didn't know litho1pt0 is not suited for a JOSS publication. But would be nice if it has its own repo along with a Zenodo doi.

Alright, I have created a restructure branch to deal with the moving everything to the top-level directory with this commit c735c0a. Changes include:

  • /stripy-src -> /
  • /stripy-src/stripy/Notebooks -> /Notebooks

@santisoler: how do you include the Notebooks as package_data? I have the following in setup.py:

package_data      = {'stripy': ['Notebooks/CartesianTriangulations/*ipynb',
                                'Notebooks/SphericalTriangulations/*ipynb',
                                'Notebooks/Data/*'] },

But they are nowhere to be found under stripy package when it is built with pip or distutils. Could you shed some light on why this is happening?

Many thanks!

Yes it's in there - but why would you define package_data if you didn't want it to be included!? Weird option.

Alright, I have created a restructure branch to deal with the moving everything to the top-level directory

Great @brmather! Would you mind opening a Pull Request for the restructure branch so we could continue the discussion over there? The Pull Requests interface offers a lot of tools for discussing over the proposed code, like reviews, comments or even changes suggestion from any collaborator.
(In case you don't know how to do it, you could get very nice instructions from this GitHub article)

Instead of waiting for that PR, I'll try to answer here so I don't delay you.

I'm seeing that setup.py import two setup functions:

from setuptools import setup, find_packages
from numpy.distutils.core import setup, Extension

The script will ultimately use only the last one of them, so would be better to remove the first import.
I'm not used to numpy.distutils.core, so I'm not so familiar with this specific setup.py.
Is there a reason why you should import setuptools as well?

According to this StackOverflow answer, if you want to use setuptools you need to import it as a whole before importing the numpy.distutils.core related functions and classes:

import setuptools
from numpy.distutils.core import setup, Extension

I'm sure you have more knowledge and experience than me on this subject. Please enlighten me.

@santisoler: how do you include the Notebooks as package_data?

But they are nowhere to be found under stripy package when it is built with pip or distutils. Could you shed some light on why this is happening?

Ok. Maybe my idea of leaving the Notebooks outside the stripy directory is not a good one because the files included on package_data must be inside the package directory.
This is mainly due to the fact that we want the Notebooks to be distributed inside the pypi package.
So I think the best choice would be to keep the Notebooks inside stripy.
Nevertheless, I think we could still apply some of the changes of restructure branch.

Maybe, we can update the repo structure to look like this:

├── binder
├── Docker
├── litho1pt0-src
├── paper
├── stripy
│   ├── f77-src
│   ├── Notebooks
│   ├── src
│   ├── __init__.py
│   ├── cartesian_meshes.py
│   ├── cartesian.py
│   ├── documentation.py
│   ├── spherical_meshes.py
│   └── spherical.py
├── tests
├── COPYING
├── COPYING.LESSER
├── LICENCE
├── MANIFEST.in
├── README.md
└── setup.py

Regarding @lmoresi comment about include_package_data = True, the setuptools.py documentation specify three ways for including data files:

  • include_package_data
    Accept all data files and directories matched by MANIFEST.in.
  • package_data
    Specify additional patterns to match files that may or may not be matched by MANIFEST.in or found in source control.
  • exclude_package_data
    Specify patterns for data files and directories that should not be included when a package is installed, even if they would otherwise have been included due to the use of the preceding options.

So, I think if you choose to use the include_package_data, the package_data could be omitted. The same goes otherwise: if using package_data, the include_package_data could be omitted.
In conclusion, if you use package_data = {"..." = ["..."]} you don't need to specify include_package_data.
I'm familiar with setuptools and the second option (using package_data), but I'm not entirely sure if numpy.distutils.core is really using setuptools or just distutils.

OK I moved the Notebooks back inside /stripy again with b1a135d.

We need numpy.distutils.core to compile fortran sources. As far as I'm aware setuptools is not capable of doing that.

In your proposed directory restructure, you have kept litho1pt0-src in there. I was under the impression this should be moved to a separate repository under the underworldcode group. Is this still the idea? I've removed it and all associated notebooks in this pull request #31

I think we do move it. Some folks like to have related projects together, some do not. However, this one was always an uncomfortable marriage and I prefer the unencumbered stripy to be honest.

In placing the Notebooks within stripy, I followed a pattern that has been advocated elsewhere for distributing documentation in this form. As we have seen from trying alternatives, no choice is perfect but this one does work and it does have some precedent.

When we get to that point, it seems to be a matter of taste and we can defend that with readers of JOSS and users of the code.

OK I moved the Notebooks back inside /stripy again with b1a135d.

👍

We need numpy.distutils.core to compile fortran sources. As far as I'm aware setuptools is not capable of doing that.

Yes, you're right. But, from what I've read, I understood that importing setuptools would enhance the setup function from numpy.distutils.core. My main concern about this was not functionality, but the presence of two setup functions on the same script, what could cause problems in the future if you don't remember exactly which one you must use.

In your proposed directory restructure, you have kept litho1pt0-src in there. I was under the impression this should be moved to a separate repository under the underworldcode group. Is this still the idea? I've removed it and all associated notebooks in this pull request #31

Yes. Sorry for that, maybe I should have remove it from the structure.
My plan was to solve the notebooks location on this issue, leaving the presence of litho1pt0 for #26.
That's why I left the litho1pt0 on the proposed structure, just to make the two issues independent of each other.

In placing the Notebooks within stripy, I followed a pattern that has been advocated elsewhere for distributing documentation in this form. As we have seen from trying alternatives, no choice is perfect but this one does work and it does have some precedent.

When we get to that point, it seems to be a matter of taste and we can defend that with readers of JOSS and users of the code.

I agree with that solution. Sometimes we open an issue that won't ultimately solve any problem, but make us rethink why we have designed some piece of software as it is and why it's the best solution we can find.
I think this kind of discussion make software more robust.