ImagingDataCommons/libdicom

Allow caller to provide its own file I/O

bgilbert opened this issue · 6 comments

Some applications may want to use their own file abstraction rather than having libdicom read directly from the filesystem. (For example, the application may have a DICOM File in memory and want to avoid writing it to a temporary file.) Please consider providing an alternative to dcm_file_create() that takes an opaque pointer plus read/seek/eof callback functions that operate on it.

I have been considering this before but was unsure about how to best implement such an abstraction. I think your suggestion is great.

If I understand you correctly, we could provide a dcm_file_create_from_reader() (or something along those lines, which would receive a FILE-like object along with the callback functions for data access. The dcm_file_create() function could then reuse that function under the hood and supply the fread/fseek/feof functions as read/seek/eof callback functions. If we store those as function pointers on DcmFile objects, the interface of its functions (e.g., dcm_file_read_metadata()) would not have to change.

The library currently does not yet support writing, but we may want to consider adding a write callback at this point, so that we won't have to change the API later. What do you think @bgilbert?

Yup, that's the idea. It'd be worth spending a bit of time to define the callback function semantics, since the fread/fseek/feof signatures can't be reused verbatim anyway. (They take FILE * rather than void *.) It might be worth combining the two size arguments in fread, and possibly defining its return value to be signed as read(2) does so a separate feof isn't needed. dcm_file_create can then configure the DcmFile object with small wrappers around fread/fseek/feof.

Re writing, if there isn't an intention to support both reading and writing in the same DcmFile, it might be a bit cleaner to create separate dcm_file_create_from_reader and dcm_file_create_from_writer functions. The latter could then be deferred until the library supports writing. I'd suggest splitting dcm_file_create the same way, too, if the current API isn't considered stable yet.

We went though this a while ago with libvips and my feeling was that there are some problems with stdio on Windows:

  • You can have 1024 open files with stdio, but 2048 with plain open/read/write/close.
  • glib has some nice wrappers over open to handle filename encodings.
  • You can get into trouble if you pass FILE pointers between DLLs, since different DLLs can use different C runtimes and therefore have different stdio implementations. You can't open a FILE in one DLL and close it in another, for example.

Plus the benefits of stdio are rather small. You can implement buffered read and write in under 100 lines of C, and have better EOF handling.

We went for the base open/read/write/seek calls, with user callbacks that exactly match the POSIX specs for these functions. This has the advantage of avoiding stdio, and giving users a well documented and understood interface. We jumped off the spec and used 64 bit ints for offset and size.

Strictly speaking, the callback API and the default I/O handling are separate concerns, right? I agree that the unbuffered I/O API presents a cleaner interface for the callbacks. libdicom can then map that onto whatever default I/O implementation it prefers.

Since DICOM data are often communication over network (DICOMweb), a clean abstraction of the underlying data access operations via the callback API would facilitate reading data sets from in memory file representations. I think this would also be useful for other language bindings (e.g., JavaScript/WASM where one would rarely access files on disk).

Implemented with #36