schuderer/mllaunchpad

Make resource IO thread-safe

schuderer opened this issue · 2 comments

Right now, the main use case is to train/test a model once, then run its prediction API (and maybe occasional re-testing). The heavy load is on the API, so the model is only ever read, never updated concurrently.

But when we enable on-line learning, every prediction API call could potentially create a new, updated model. One problem here is of course performance (so some sort of delayed serialization is probably in order). Another problem is concurrency between threads (let's just implement this for threads, not processes, in the beginning!). Theoretically, the model could be updated from different threads at the same time. We need to make sure this is possible WITHOUT blocking out other api calls.

Suggestion would be to implement a pattern similar a IO queue monad. Calls to the resource module to write the new model (or read the old one for that matter) would return immediately, but not necessarily be executed immediately -- the data to update the model with would be put in a queue on which a separate thread would be chugging along, serializing the stuff that comes in (and possibly even throwing away invalid updates if the model version that has been updated has in the meantime been updated by someone else -- this is a choice -- alternatively, we can force every update to be in place, but then, API calls would have to wait for other API calls to finish).

Good luck. :)

Same might be relevant for (some types of) data sinks.

Related to #32