Improve logging (logs.json) with SQLite

Question

Improve logging (logs.json) with SQLite

Cadene opened this issue 6 years ago · 0 comments

tl;dr: SQLite will replace logs.json

Our current implementation

We use a Logger object that stores data as lists of "values" associated to "keys" in a python dictionary. This dictionary is stored in RAM. At the end of a train epoch or eval epoch, Logger creates/flushes a logs.json file in the experiment directory.

logs/myexperiment/logs.json

{
  'train_epoch.epoch': [0, 1, 2, 3, 4, 5],
  'train_epoch.acc_top1': [0.0, 5.7, 13.8, 20.4, 28.1, 37.9]
}

Its problems

If the code crashes before a flush, the data is lost and we want to use Logger to monitor stuff such as CPU memory usage or GPU memory usage before a crash!
We need to write the full json files each time a new value has been added.
We need to load the full json files each time a new value has been added to visualize stuff.

Our constraints

We want to keep our logs in the experiment directory (no SQL/NoSQL datasets, SQLite maybe?).
We want to write new values only (For instance, we only write values of epoch 10 at epoch 10).
We want concurrent reads and writes (at least in differrent keys).

Some propositions

The following tools store the data on the file system (not in RAM).

H5PY one file

see

logs/myexperiment/logs.h5py

Pros:

Use numpy
Easy to access data['train_epoch.epoch'][10]

Cons:

Extendible datasets (when you don't specify the number of size) seems to need "resize" see.
We encountered a lot of bugs in the past due to HDF5 when we multi-thread/multi-process reading or writing

LMDB

see

logs/myexperiment/logs/train_epoch.epoch.lmdb

Pros:

Cons:

Cumbersome to use

netCDF

see

logs/myexperiment/logs.nc

One CSV per key / or binary file

logs/myexperiment/logs/train_epoch.epoch.csv

Pros:

Very easy to understand, and track

Cons:

Creates one file per tracked variable
Associating different variables for the same time step requires reading different files and aligning them
Difficult to implement (reinvent the wheel)

SQLite

see

logs/myexperiment/logs.sqlite

Pros:

Can grow big enough
Allow easy concurrent read/write
Caching system (TODO source)
Binary encoding
Indexing (easy to read only what we want)
Meta-data: timestamp, epoch_id, iteration_id
Fault-tolerant (if crash happen)

Cons:

Requires library to read, user must know SQL to do custom queries/applications (We could add a wrapper over SQLite in Logger)

Experiments comparison in SQLite

databases = []
for experiment in all_experiments:
  databases.append(open...)
for experiment, database in zip(all_experiment, databases):
  for metric in list_of_metrics:
    min_metric = select... # may be already in cache
    max_metric = select... # may be already in cache
    (use it here to agglomerate in python)