NEUBIAS/training-resources

Image data formats: refactor into one activity per format

Opened this issue · 27 comments

Recent teaching experiences showed that handling such "monster activities" (https://neubias.github.io/training-resources/image_file_formats/index.html#open) can be challenging for trainers, students and maintainers, because

  1. one can get lost
  2. especially if one does not want to show everything
  3. not all sub-activities can be implemented in all platforms and not all sub-activities may be interesting for all audiences
  4. some sub-activities would benefit from additional explanations in the preface, which would explode the preface
  5. specifically here, I would, e.g., like to add opening a "movie" sequence of JPEG files and not everyone may find this relevant or interesting, and also I would like to add some other commercial formats that are occurring frequently at EMBL (and again, not everyone may find this relevant).

I would therefore like to inquire how you feel about splitting this up into one activity per image data format? A disadvantage could be that if we find general new tricks how to best open such image data we would need to change this in several activities.

ping @manerotoni @k-dominik @AnniekStok

In fact, another reason to refactor this is that in python saving in different formats may be less easy and require various dependencies. In Fiji everything is conveniently bundled under File > Save as..., this will be less simple in other platforms.

Hello @tischi,
I never taught the module, but quickly looking at it, the activities are overwhelming. It will not harm to separate those. It could be that at the end we have separate modules for 'simpler' data sets and more complex data sets. This should not affect the activities.

Hi @tischi,

I agree the activity looks a bit complicated and splitting it up would make it easier to pick out specific parts that you want to teach. Including different microscope formats and how to directly concatenate a movie from a folder would be very useful. As you said splitting the activities by file format will have the benefit that we can pick the specific reader/writer method needed for that type of data so it should be easier. We did something similar for a Napari workshop recently. Just wondering, right now the module appears to serve multiple purposes, which in combination with all different file formats can make it a bit overwhelming (and possibly very large). Would it make sense to split it further into separate consecutive modules or would it fragment the flow too much?

  • Opening data of different (microscope) formats (e.g. you may need Bioformats in FIJI, aicsimageio in Napari) and you may see that the data is ordered in series or that channels are opened as separate layers or as hyperstack.
  • Differences in metadata between file formats and where to find it.
  • Saving data in a new format (e.g. Tif to JPEG) and the effect it has on pixel intensities / bit depth.

@AnniekStok

So you are proposing to separate opening and inspecting image pixel data and opening and inspecting image metadata into two different modules? I am not sure about this, could you please elaborate a bit more why you think this would be good?

Regarding the saving being a different module: I think we all agree (see #719).

I was just wondering, in part to simplify the module and in part because the concept of metadata feels like it can be a topic of its own, depending on how deep you want to go into it. We recently tested a microscope that saved the metadata separately from the tif output (I actually did not like that at all, what if you lose the metadata file or accidentally move it somewhere else?). But occasionally one might want to read only the metadata. However, I also understand it is convenient to keep it together with opening different image formats since each format has its own metadata structure.

Good point....I thought didactically it could be good to make the point that the mapping of the binary pixel data into an XYZCT space needs some metadata, e.g. which TIFF plane corresponds to what. If that information is missing or wrong it will be hard to read the data in a meaningful way, at all.

But the microscopy settings metadata is a different and indeed something that could be looked at separately.

Maybe we could restrict this module to "essential metadata" such as the XYZCT mapping and pixel spatial calibration?

And then refer to a future module for microscope related metadata?

Yes I see your point. Depending on the reader the image may be displayed correctly automatically, but if not, it is helpful to know where to find this information and how to apply it (for which we can refer in part to the spatial calibration module). I think restricting the module to the essential metadata and referring to a separate module that dives into microscopy metadata could work!

In am still not sure. At least in Fiji, reading the metadata is as easy as checking one box in the Bio-Formats Importer. Thus, teaching how to open it does not present a lot of overhead. Of course, then really digging through all the metadata could take a lot of time, but I don't think we need to do this here. We can just tell the students: "That's how you can access metadata easily, good luck finding there what you need".

@manerotoni @AnniekStok

Which image formats should we teach to open?

I think it would be important to cover some of the typical complex cases, e.g.

  • TIFF file containing one image with microscope metadata
    • To show something normal
  • CZI or LIF file containing several images
    • To teach that one file can contain several images
  • Olympus VSI file
    • To teach that one image can be distributed over several files and folders)
  • TIFF movie distributed into several files with a naming scheme such as im_t0.tif, im_t1.tif
    • To teach that sometimes the axes metadata is in the file-name

What do you think?

Please note that my current plan is to teach "big image data formats" like OME-Zarr and XML/HDF5 in another module, because the need the additional concept of a resolution pyramid.

Please also note that there my current plan is to have a whole module about OME-TIFF, which also can be multi-resolution, can contain several images, and as such is quite complex and too much for an overview module, imho.

I would keep the metadata at a minimum (axis notation and pixel size, Dt). The rest could be addressed in another module if necessary. The rest is often how the image has been acquired (laser power, filters etc.).
I think it is also nothing wrong to check metadata using bio-formats/Fiji and then use this info in python.

Didactically the examples you picked are fine. I would have something with tiles too, just the opening. Stitching does not fit here and is a much needed separate module. The fact that we plan now to separate the activities per data format is good (do czi and lif separated). The data format is a little facility/institute specific and depends on what instruments you have.

I was not aware the ome-tiff can do pyramids :-)

I agree those examples are good! With czi / lif / (ndi) it could be nice to explore the different options for splitting channels and/or timepoints using the bioformats importer. For concatenating a movie from a folder containing z-stacks, it might be nice to show the virtual stack option and how to set the hyperstack correctly.

I think I would also like to add one activity for the ilastik hdf5 format, bc HDF5 in general is important to know.

Hi,

I started the refactoring by implementing an activity to open a CZI image (see above commit).

May I ask for your help with the other file formats? It is relatively straightforward, just copy what I did for the CZI format.

I don't think we need PRs for this, simply push to master with commit messages referring to this issue (#720), e.g.:

git commit -m "Add activity to open CZI image, #720"

@felixS27 could you please modified this as mentioned in the TODO?

@AnniekStok could you be motivated to add activities to open some of the other formats that we agree upon?


Regarding the TIFF series: I actually want to change this and not do a movie but EM volume slices, because this is very relevant for EMBL here.

I now also added a TIFF series activity.

@felixS27 could you please look into implementing the corresponding python activity? I do not know whether this could also be done with bioio. If not, maybe tifffile offers something? You could ask on the forum if you do not find anything; I am positive that there should be something...

Hi,
yes I could try to work on that (maybe only after I am back from traveling though). I can do lif and tif, but I do not have any vsi files, is there sample data for that? Do we actually want to include activities opening different formats in napari as well?

Hi,

I started the refactoring by implementing an activity to open a CZI image (see above commit).

May I ask for your help with the other file formats? It is relatively straightforward, just copy what I did for the CZI format.

I don't think we need PRs for this, simply push to master with commit messages referring to this issue (#720), e.g.:

git commit -m "Add activity to open CZI image, #720"

@felixS27 could you please modified this as mentioned in the TODO?

@AnniekStok could you be motivated to add activities to open some of the other formats that we agree upon?

Regarding the TIFF series: I actually want to change this and not do a movie but EM volume slices, because this is very relevant for EMBL here.

@tischi sure, I will do this. Should I then also do this for the other files formats in parallel?

I now also added a TIFF series activity.

@felixS27 could you please look into implementing the corresponding python activity? I do not know whether this could also be done with bioio. If not, maybe tifffile offers something? You could ask on the forum if you do not find anything; I am positive that there should be something...

@tischi I will look into this.

@tischi sure, I will do this. Should I then also do this for the other files formats in parallel?

Yes please, you could also start refactoring and the @AnniekStok could fill in the ImageJ implementations.

Thanks!!!

@tischi I split the open with bioio python activities into open a .tif, .lif, .czi. For the later two, I included to open and inspect both images. I also implemented a script to open the em tiff series.
I am not sure if some of the code is maybe already too advanced. But the implementation is now short and boiled down to the essential stuff.
I am happy to discuss the things I could improve.

@jhennies I created a python script for reading em tiff series as discussed above. @tischi mentioned that I should ping you and ask, whether you think this is the proper way and to ask how you are doing this.

@felixS27 can you quickly link the python code? Then I'll have a look

Maybe quickly how I'm doing it:

from glob import glob
from tifffile import imread, imwrite

data_folder = 'path/to/data' 
pattern = '*.tif'

filepaths = sorted(glob(os.path.join(data_folder, pattern)))

# Then you can process for example like so if you do slice-wise operations:
out_dirpath = '/path/to/output'
for filepath in filepaths:
    image_slice = imread(filepath)
    result_slice = some_func(image_slice)
    out_filepath = os.path.join(out_dirpath, os.path.split(filepath)[1])
    imwrite(out_filepath, result_slice, compression='zlib')

# Or load the stack into memory (if you really have to ...):
full_stack = [imread(filepath) for filepath in filepaths]

This is the implementation I was talking about.

Logically it's the same. Just one question: I've never used the BioImage package, is there a specific reason that you are using it here. Are we using it anywhere else in the teaching materials?

True. Than I think I can leave it like that :)
For the python implementation of the image file format module I used the BioImage package, because it is quite easy to use and you only need to teach one package to open a lot of different image file formats. That is the main reason I used it in this context. And in theory it would also allow you to run this same code for a different set of image files without the need to load different packages and using different interfaces.

Good enough for me :) I agree that you can leave it like this

I just added the ImageJ GUI activities for TIF and LIF and also added the reference to the BioImage versions.