geneontology/minerva

Explore adding auto-flush to disk to minerva

kltm opened this issue · 9 comments

kltm commented

Basically, we currently flush minerva to disk using an external script (dating back when we thought an administrative interface would be more fully realized with barista). Unfortunately, there are issues doing things this way (e.g. #378 (comment)) that make automation and checks more difficult.

We'd like to explore adding a thread into minerva so that it would flush itself to the filesystem at a frequency specified on the command line (as specified in minutes, hours (fractions possible), or whatever standard format is convenient). It would note this in it's log when run.

This would alleviate the need to have to use a script that's currently guaranteed to fail in minerva setups, making the initial setup of new instances a little easier (one less repo and no crontab entry needed) and improving monitoring by having something that actually "works".

kltm commented

Tagging @balhoff

kltm commented

This would fold into current work we're doing with the dockerization and automated deployment of noctua into AWS.

As discussed in tech call, may implement by just writing out a model at the same time it's saved to the database. This way we don't needlessly dump all models.

@kltm I implemented flush-on-save in #505. I've merged this functionality to the dev branch.

kltm commented

Awesome, I'll try and get this up immediately.

kltm commented

From a discussion earlier on the technical call, I'm going to increase the frequency of commits to 5min and keep pushes at 30m.
Also from the discussion, we were wondering if we could split up commits to individual models (like a bash loop) and have a high-frequency commit, we essentially have a history system that's tied to saves. We could even setup some metadata injection in minerva (authors since last save) that would make the data very mineable and allow blames.

kltm commented

@balhoff @vanaukenk I looked around for a ticket to add the "history" idea to, but everything seems old and closed out. Do we want to start a new project proposal to look at history?

Sounds good to me.

kltm commented

@vanaukenk This has now been deployed to production.
The commit frequency is now 5min and the push frequency is 30m.