girder/large_image

TIFF with multiple image frames being read as having 1 frame

dodewall opened this issue · 16 comments

Hello! I'm trying to use large-image to view & manipulate large (~12GB) timelapse acquisitions (~1300 16-bit grayscale frames saved as a .tif file). When I use source = large_image.open(file_path), the source object shows the correct sizeX and sizeY, but source.frames returns "1". The tileIterator also only returns a single tile. Is there any advice for troubleshooting available?

large_image has several tile source readers that read tiff files. tiff files have huge variations, and some writers (such as ImageJ) claim to write tiff files that are not actually compliant with the specification. From python, you can see which reader was used by doing something like print(large_image.open("file.tiff")) and it will print the name of the class that actually did the reading.

For any tiff file, we can see its internal structure using the tifftools python package. pip install tifftools in any modern version of python, then tifftools dump file.tiff will print the internal details. If the file is compliant with the tiff specification, I'd expect this output to be huge (a few dozen lines or more of output per frame). If you can share the first hundred-ish lines of that output, then we can know exactly what is going on. If your file has private data in it, such as names, it could be revealed in this output, so please make sure this is publicly shareable.

Thank you @manthey! Below is the output of tifftools dump file.tiff (file.tiff being replaced with the file path I'm interested in). It's not even 100 lines

Header: 0x4d4d
Directory 0: offset 8 (0x8)
NewSubfileType 254 (0xFE) LONG: 0
ImageWidth 256 (0x100) LONG: 2190
ImageLength 257 (0x101) LONG: 946
BitsPerSample 258 (0x102) SHORT: 16
Compression 259 (0x103) SHORT: 1 (None 1 (0x1))
Photometric 262 (0x106) SHORT: 1 (MinIsBlack 1 (0x1))
ImageDescription 270 (0x10E) ASCII: ImageJ=1.54f
images=3337
slices=3337
loop=false
min=104.0
max=4095.0

StripOffsets 273 (0x111) LONG: 291698
SamplesPerPixel 277 (0x115) SHORT: 1
RowsPerStrip 278 (0x116) LONG: 946
StripByteCounts 279 (0x117) LONG: 4143480
ImageJMetadataByteCounts 50838 (0xC696) LONG: <3338> 12 78 78 78 78 78 78 78 78 78 80 80 80 80 80 80 80 80 80 80 ...
ImageJMetadata 50839 (0xC697) BYTE: <278106> 73 74 73 74 108 97 98 108 0 0 13 9 0 116 0 58 0 49 0 47 ...

This file is "not-quite-a-tiff" file written by ImageJ. These are a valid 1-frame tiff file with all the extra frames appended afterwards without proper tiff references to them. Normally large_image asks the tifffile source to read these, since it has specific code to handle this. There, I wonder if large_image picked the tifffile source module if it would read correctly. Does the following show the correct number of frames?

import large_image_source_tifffile  # note that this is tifffile not tiff

source = large_image_source_tifffile.open(file_path)
print(source.frames)

If so, then I need to dig into why the tifffile source wasn't chosen by default. If not, then I'll need to look a little deeper.

Running the code above on the "not-quite-a-tiff" file throws the following error:
TileSourceError: File cannot be opened via tifffile source: 'no maximum series'

Is there a way to convince ImageJ to write the tiff file "properly"?

I don't see anything in ImageJ's user guide to vary how it saves tiff files.

Can you check if you have a very recent version of the tifffile python package in your environment? If not, maybe upgrading that will help. If it is very recent, then the tifffile package fails to read imageJ output and we could probably hunt down what is going on there. If you can share the first 300,000 bytes or so of your file, I'd be able to replicate the issue (basically all of the header information and none of the imagery, which based on tifftools dump is the first 291,698 bytes).

I have tifffile version 2024.8.10. If that's meant to be the date of its release, then it is very recent.
Happy to share part of the file, but please forgive my ignorance - what's the easiest way to split the first 300k bytes from the file for your replication? Is it possible to use tifftools?
Thank you for your guidance.

The linux command head -c 300000 file.tiff > tiff-header.dat will do it.

I've attached the first ~300kb as a .txt file here (neither .tif nor .dat files are not supported for attachment):

nov4_d.sbdsort1_div15 [aligned].txt

With this and the file extended with random data to a total of 13827084458 bytes, my instance of large_image uses tifffile and properly reports 3337 frames. This worked on several versions of python and on linux and osx. This means either your actual file is a different length then I'd expect from the headers or your environment is somehow significantly different than mine. Can you confirm your file's length? And, if that matches, can you give details on your OS/Python versions and which version of large_image you have installed.

File length is confirmed as 13827084458 bytes.
OS version: Windows 11 Home Version 10.0.22631 N/A Build 22631
Python version: 3.12.5
large_image version: 1.29.4.

I tried using large_image_source_tifffile.open(image_file) in both python on the command line and in a virtual environment with the above versions of python and large_image installed. Same result, specifically:

large_image.exceptions.TileSourceError: File cannot be opened via tifffile source: 'no maximum series'

Could the file path be the issue? I had to add an escape character ('\') behind each backward slash in the file path since I'm in a Windows environment.

For what it's worth, when I try opening the file without specifying the source, this is the result (reading as a JPEG?)

Command: large_image.open(image_file)

Result: PILFileTileSource ("('C:\\\\Users\\\\oadew\\\\Downloads\\\\nov4_d.sbdsort1_div15 [aligned].tif', 'JPEG', 95, 0, 'raw', False, '__STYLESTART__', None, '__STYLEEND__')", None),None

I had only tried on linux and OSX. I get exactly your result on Windows. The culprit is a line to check sanity of the image and find the largest image series that reads np.prod(s.shape). In linux and OSX this behaves as expected; if Windows, numpy's default integer is int32 (not int64), and this produces the wrong value. I'll have a fix for this shortly (but I worry that we've made assumptions somewhere else like this).

Thanks for working through this.

For what it's worth, when I try opening the file without specifying the source, this is the result (reading as a JPEG?)

That `'JPEG'`` term indicates that if you ask for part of the image as an image tile it will default to returning a JPEG. You can override and ask for any output format PIL supports, but this is the default since as a tile server for the web jpeg is often an acceptable choice.

@dodewall You can try this out by installing the development release (pip install "large-image-source-tifffile>=1.29.6.dev2").

Amazing - this works; thank you!

Thanks for the confirmation, and I'm glad we could hunt down what was going on.