MeteoSwiss-APN/mch-python-blueprint

Document inclusion of non-Python files

Closed this issue · 3 comments

  • Figure out whether it is possible to properly include non-Python files alongside code during install without putting them in src/<package>/
    • Shapefiles files for map plots, CSV or other small data files, ... (not talking about binary data files or files required for tests!)
  • Document how to include data files (wether inside or outside src/<package>/)

Note

At the time of writing, pyflexplot successfully installs data files (shapefiles, preset setup files) alongside the code by adding them to MANIFEST.in. However, the location of these files in src/ is not ideal, they would better be installed in a separate data/ directory.

Did some research on how to pull data/ out of src/<package>/:

  • Unfortunately there does not seem to be a standard way to achieve this in a clean way!
  • According to this answer to the same question it should be possible, but I didn't get it to run for pyflexplot, and I'm suspecting it would not work properly with editable installs.
  • It is possible to separately install data files (see here), but their location is semi-hardcoded and needs to be accessed via, e.g., sys.path, which is not overly robust.
  • In conclusion, keeping the data in src/<package>/data/ still appears to be the best of several bad solutions...

Quote from here:

Typically python packages are installed as zip files, meaning that their source is not available as files on the file system. The zip_safe = True flag means that this is okay for your package. If you make use of file attributes to find included data files in your package you will probably need to set zip_safe = False. But instead of doing that, please be kind to your users and consider using either importlib.resources or pkg_resources:

from json import load
from pkg_resources import resource_stream

def load_schema():
    return load(resource_stream("example", "data/schema.json"))

This will also work when your package is installed as a zip file.

With resource_stream, it might be possible to access data files outside of src/, which would solve the problem!

TODO: Try this out!

Closing as related to #39.