SenteraLLC/geoml

Make a plan for using NewtDB (an extension of ZODB for PostgreSQL)

tnigon opened this issue · 0 comments

NewtDB docs

As I understand it, NewtDB is great for storing Python objects into a database. After feature selection, tuning, and training, we will want to store the object, then retrieve it later to perform predictions with new data. NewtDB can store an object, then we have the choice whether we load that entire object or simply the data of that object. This flexibility seems to be advantageous depending on what we are trying to achieve.

To Do

  1. #XX - Demonstrate a minimal working example of NewtDB with Tuning API and a PostgreSQL database. Show how an object can be saved then loaded back, and also show how to retrieve just the data.
  2. #XX - Develop a plan for how Tuning objects should be stored in PostgreSQL. Things to consider include retuning/retraining with same data, with new data, easy access to parameters, test accuracy, etc., and relationship with how we bring new data stored in a DB into that object (same DB or different DB).
  3. #XX - Determine in what cases it is most appropriate to create an entirely new object, and in which cases it is is okay to grab the object and re-run one of it's functions (perhaps saving over the object in the DB or saving as a new object in the DB). In the last case above, maybe it's best to just create a new object with the new data and retrain, retune, etc. We may be able to save a lot of CPU and/or DB storage space by simply updating an object at the training step rather than starting from scratch and updating the features selection, tuning, and training.
  4. #XX - When implementing this, be sure to be sure objects are "persistent" and the appropriate coding is used to keep track of updates to the objects. See this ZODB link.
  5. #XX - Use Zope or similar to manager generations of the objects being stored in the DB. As I understand it, this can be used to ensure we use a particular library version when accessing an object from the DB so that there aren't inconsistencies with, e.g. attribute names that may have changed in a later library version.