sacdallago/bio_embeddings

Change default cache dir?

Closed this issue · 5 comments

I'm having problem with these two specific lines

https://github.com/sacdallago/bio_embeddings/blob/develop/bio_embeddings/utilities/remote_file_retriever.py#L49
https://github.com/sacdallago/bio_embeddings/blob/develop/bio_embeddings/utilities/remote_file_retriever.py#L106

My default cache directory at work has a limit of 10 GB, so it fails downloading the models. How do I overwrite appdirs.user_cache_dir and specify my own cache directory?

Tried this but it didn't work:

from appdirs import user_cache_dir

def user_cache_dir_custom(folder):
    return "/my/own/cache/dir/data/06_models/"+folder

user_cache_dir = user_cache_dir_custom

Found a quick fix:

  1. Install the package gorilla
  2. Whatever script you are using bio_embeddings in, add this code at the top:
#####change cache dir
import gorilla
import appdirs

def user_cache_dir(folder):
    return "/your/cache/dir/"+folder

patch = gorilla.Patch(appdirs, 'user_cache_dir', user_cache_dir, settings = gorilla.Settings(allow_hit=True))
gorilla.apply(patch)
######

In my case, I added this code to the beginning of EAT/eat.py from https://github.com/Rostlab/EAT

You can set XDG_CACHE_HOME (https://github.com/ActiveState/appdirs/blob/8eacfa312d77aba28d483fbfb6f6fc54099622be/appdirs.py#L313). But it might be easier to just make .cache/bio_embeddings a symlink to the storage dir

Thanks for the alternatives! Forgot about symlink, that's probably the easiest way

Found a quick fix:

be easier to just make .cache/bio_embeddings a symlink to the storage dir

Any plans on creating a non-hacky way of specifying the cache directory?

Hi @nick-youngblut , I'll consider adding the feature to the next release, but there are a couple more burning issues on my list (although: small change, should be quick to do... but you never know what goes wrong in the process :))