schuderer/mllaunchpad

Have long preparation/loading happen before (not at) first API call

schuderer opened this issue · 1 comments

When I use word embeddings, I can configure them as a binary_file data source and data sources to be pre-loaded using api:preload_datasources: True. But I still need to load them into an object (using gensim), which unfortunately takes a long time.

This happens at the first call of the API (because that is the only place where I can currently put my user code). It leads to a time-out of the call and (in my setup) also to the WSGI app being killed and restarted automatically.

I would like this to happen on startup of the WSGI app (before the first API call).

Suggestion:
I would like to be able to add an optional function prepare() to my ModelInterface-inheriting class (which has all the important arguments model_conf, data_sources, data_sinks, and model. This function would be called (if specified) when the API is starting up, and can prepare/cache objects (e.g. by assigning them to self, or through other means).

We'll have to check whether e.g. gunicorn will also kill the process if it takes a long time to start up (don't expect it to).