CardiacModelling/BenNevis

Make nevis installable

Closed this issue · 13 comments

  • Add license (that clearly mentions licenses for other data, included and downloaded)
  • Make bnglonlat pip installable
  • Remove fit script. Separate issue: #42
  • Move hills.zip into module, make setup.py install it (and delete it)
  • Move terrain data to system shared (not user!) dir
  • Stop auto-downloading data, add separate command
  • Update installation instructions
  • Move version to separate file
  • Add setup.py
  • Add to pypi

I'd like to start working on this issue first. Could you offer more details about the "entry point" script? E.g. where I should put the code and how is the map supposed to be shown.

I haven't worked out if this is a good idea yet (w.r.t. downloading data, licensing etc.), so please just fork or use submodules for now :-)

The problem with this idea is that nevis downloads ~160mb of data, then creates a 1.5GB numpy file with terrain data, and optionally 2.9GB of spline data. This space needs to be freed when the package is uninstalled again, but setup.py and pip don't provide a good mechanism for that:

So we could

(1) Add fake files to the package that will be overwritten when the proper caches are downloaded, and these will then get uninstalled when the package is uninstalled. Several major drawbacks to this:

  • If it goes wrong we leave 5 gigs of data somewhere in the filesystem (something/python/something/site-packages/etc)
  • We need to remember each cache file. If we forget we start leaking data
  • If you install with root/admin permissions, you then need the same permissions to run when downloading
    So this is not an option.

(2) Place the files somewhere in the user directory. Instead of a hidden place (e.g. ~/.cache/) we could stick it somewhere notable like ~/ben-nevis-cache/. This at least makes it visible (although I'm not sure windows users often look in their root user dir?) but is still quite annoying...

Thoughts anyone? @mirams @EricWay1024 ?

I think it's okay to use (1). As for the drawbacks you mentioned:

  • We have limited number of cache files, so it should not be a big problem to manage them and ensure that they are removed as expected;
  • I suppose when the user can install with root permissions, they can also download with the same permissions. At least we can provide some documentations about what users should do in such scenarios.

Alternatively, we could prompt the user and ask them to decide where to put the data files. Some users might be willing to put the files in a fixed place under their user directory, so that when they use venv they don't need to download the same data more than once.

There's plenty of situations where you want to install as root, but never run as root (e.g. if a sys admin is in charge of installing, or just if you don't want to give root permissions to random modules downloaded from the internet!). Option 1 is definitely out!

I'm happier with 2, but I don't think we can prompt users at any point. When you do import nevis nevis.gb() you are not expecting a prompt to appear! So we need to pick a place where users will notice but not mind too much

Discussed it with some more people. Seems the best (least bad) way to do it is create a clearly named directory in the user home directory. (Similar to e.g. VirtualBox and weka)

I've noticed that the package nltk also needs to download some data for it to work. It provides users with an interactive download function (see here), and the data is saved to user's home directory (~/nltk_data/) if you don't run as an admin. I guess we can copy from them?

I like the idea of making users load the data as a separate step

I think it's better to avoid import-time side effects. So we should neither prompt the user nor download files when import nevis. What I meant is a function dedicated to downloading files (and probably prompting the user for save location), like that one in nltk.

Yeah I agree (it's a runtime side effect, btw, not import-time. I said that wrong a few posts ago)

How about

  1. The data location is set either by an environment variable or as a standard (platform dependent) path
  2. Trying to use nevis (not just importing) raises an exception if the data can't be found. This exception explains what to do to install it.
  3. The install command shows the current data path. Tells user they can modify it with environment variable. Prompts continue y/n

I agree with this idea.

That sounds sensible to me.