NeurodataWithoutBorders/pynwb

[Documentation]: Iterative writing to DynamicTable (trials, epochs, TimeIntervals )

cboulay opened this issue · 3 comments

What would you like changed or added to the documentation and why?

Hi, I'm trying to write unbounded streams directly to nwb files (1 file per stream). So far, I have this working well for numeric TimeSeries. I was stuck for a while trying to write strings as event markers, especially as trials, epochs, or TimeIntervals series. The techniques described for TimeSeries and H5DataIO don't translate to event markers.

Where I failed was that I attempted to call io.write(nwbfile) after nwbfile.add_epoch_column(...) but before nwbfile.add_epoch(...). I thought I was following the pattern where I was setting up everything, calling io.write(nwbfile), then filling in after the fact. However, it appears that you cannot call io.write(nwbfile) if you have added a new epoch column but your epochs table emains empty.

            if isinstance(value, (list, tuple)):
                if len(value) == 0:
                    msg = "Cannot infer dtype of empty list or tuple. Please use numpy array with specified dtype."
>                   raise ValueError(msg)
E                   ValueError: Cannot infer dtype of empty list or tuple. Please use numpy array with specified dtype.

../../../.venv/lib/python3.9/site-packages/hdmf/build/objectmapper.py:314: ValueError

However, if I add an epoch first, then things work fine.

I think there are a few solutions for this. The easiest is documentation -- just explain that you can't write an empty table if you've added a custom column. In my own tool, I'm just going to defer adding the custom column until I have

Another option would be to allow setting the dtype via add_{x}_column(..., dtype=str), but this is significantly more work. Or am I supposed to subclass VectorData and supply that as the col_cls argument?

For now, my solution is to not add new columns until I receive a marker event.

ETA: The other major difference between a TimeSeries is that the nwbfile has to be re-written with multiple calls to io.write(nwbfile)

Do you have any interest in helping write or edit the documentation?

Yes.

Code of Conduct

I realized after I wrote this issue that repeated calls to io.write(nwbfile) do nothing. For now I am leaving the io object open and writing once at the end when __del__ is called.

What is the Timeintervals id field? It seems that can accept an H5DataIO or DataChunkIterator. What can that be used for?

rly commented

Hi @cboulay , we are working on making it possible to add rows to a DynamicTable after write by default (cc @mavaylon1). For now, adding rows after write is a little convoluted. You would have to predefine all your columns and wrap the data of each column (VectorData) in an H5DataIO with maxshape = (None, ) -- see code below. You can already add columns to a DynamicTable in append mode.

The easiest is documentation -- just explain that you can't write an empty table if you've added a custom column.

Thanks. We have an open issue ticket about that and unfortunately have not had the bandwidth to resolve it yet. We are updating the documentation here.

Another option would be to allow setting the dtype via add_{x}_column(..., dtype=str), but this is significantly more work. Or am I supposed to subclass VectorData and supply that as the col_cls argument?

Both are significantly more work, but the former would be good for us to do.

I thought I was following the pattern where I was setting up everything, calling io.write(nwbfile), then filling in after the fact.

In general, we recommend writing the data once you have all your data available, but I understand it is risky to hold all of that data in memory.

I think your proposal of not adding new columns until you receive a marker event makes sense. Try this code to add rows and columns to a trials table after an initial write. This will allow you to append to the file repeatedly (but you have to reopen the file after closing it).

from datetime import datetime
from uuid import uuid4
from dateutil import tz

from pynwb import NWBHDF5IO, NWBFile, H5DataIO

session_start_time = datetime(2018, 4, 25, 2, 30, 3, tzinfo=tz.gettz("US/Pacific"))

# add column
nwbfile = NWBFile(
    session_description="Mouse exploring an open field",  # required
    identifier=str(uuid4()),  # required
    session_start_time=session_start_time,  # required
)
nwbfile.add_trial(start_time=1.0, stop_time=2.0)
nwbfile.trials.id.set_data_io(H5DataIO, {'maxshape': (None,)})
nwbfile.trials.start_time.set_data_io(H5DataIO, {'maxshape': (None,)})
nwbfile.trials.stop_time.set_data_io(H5DataIO, {'maxshape': (None,)})

with NWBHDF5IO("test_append_dynamic_table.nwb", "w") as io:
    io.write(nwbfile)

io = NWBHDF5IO("test_append_dynamic_table.nwb", mode="a")
nwbfile = io.read()
nwbfile.add_trial_column('correct', 'whether the trial was correct', data=['test'])
nwbfile.trials.correct.set_data_io(H5DataIO, {'maxshape': (None,)})
nwbfile.add_trial(start_time=2.0, stop_time=3.0, correct='yes')
io.write(nwbfile)
io.close()

with NWBHDF5IO("test_append_dynamic_table.nwb", "r") as io:
    nwbfile = io.read()
    print(nwbfile.trials.to_dataframe())

@rly , this was very helpful, thank you!
I was able to complete my objective and I now have multiple live data streams sinking to a single nwb file.
I'll close the issue now and await the release of the other API to stream to disk.