G-Node/nix

concurrent reads on a file unsafe

Ott-Alexander opened this issue · 10 comments

There is a way to built HDF5-lib in thread-safe mode, which allows concurrent reads but that is only for C officially supported.
https://portal.hdfgroup.org/display/knowledge/Questions+about+thread-safety+and+concurrent+access

Wait, concurrent reads are unsafe?
The linked FAQ says

Does HDF5 support concurrent access to a single dataset from multiple processes?
If all processes are reading, then, yes, HDF5 (serial) does support this.

Reading while writing is also "safe" (in that there is no possibility of on-disk corruption) but the reader might be getting inconsistent data.

gicmo commented

NB: we are not using the official C++ bindings, but have our own version. They are a thin layer on top of the C library and I don't think they are done in a non-thread-safe way; but I am not 100% sure. They would need to be checked.

Doesn't everything use the core C library at the end of the day (h5py included)?

gicmo commented

@achilleas-k sure, but in the C++ one could cache/re-use handles or what not that makes it non-thread safe/aware.

@gicmo Good point. Had not considered.

Back to the original issue: Are concurrent reads unsafe?

gicmo commented

I don't think so, I can't think of something funny we do, that would make them. @nyxas you have any indication of reads being a problem? Crashes?

In nixview i'm currently building a way to load more data for plots and for that i'm using a second thread.
When it loads more data there a some seemingly random instances where it throws one of two errors.
After the first one nixview crashes completely the second one doesn't have any obvious consequences.
(going to push my branch soon and link to it)

Error 1:
#000: ../../../src/H5L.c line 813 in H5Lexists(): not a location
major: Invalid arguments to routine
minor: Inappropriate type
#1: ../../../src/H5Gloc.c line 195 in H5G_loc(): invalid group ID
major: Invalid arguments to routine
minor: Bad value

and error 2:

HDF5-DIAG: Error detected in HDF5 (1.8.16) thread 140473211713280:
#000: ../../../src/H5I.c line 1384 in H5Idec_ref(): can't decrement ID ref count
major: Object atom
minor: Unable to decrement reference count
#1: ../../../src/H5I.c line 1491 in H5I_dec_app_ref(): can't decrement ID ref count
major: Object atom
minor: Unable to decrement reference count
#2: ../../../src/H5I.c line 1421 in H5I_dec_ref(): can't locate ID
major: Object atom
minor: Unable to find atom information (already closed?)

So finally pushed it.

The errors occur in getAxis() in the LoadingThread. (line 112)
But the function only gets copies of the members, so there should be, as far as i know, no way for the main thread to interfere with it during this function, except if they can't both read the file at the same time.
https://github.com/Nyxas/NixView/blob/lineplotter_rework2/utils/loadthread.cpp

I built in a delay in run() before it starts and since then get no problems, but I can't say for sure if it's just very rare because it runs only rarely, or it is 'fixed' because the lineplotter doesn't access the DataArray anymore.

And the lineplotter that controls the Thread:
https://github.com/Nyxas/NixView/blob/lineplotter_rework2/plotter/lineplotter.cpp

I think both threads call functions on the dimension of the array at the same time.
In loadthread during the getAxis() and in lineplotter during the calcStartExtent() (line 141) and checkForMoreData() (line 457).
I tested them and they also throw the same errors

Closing this for now: The current state is that concurrent access to the file should be avoided.