tiff2jpeg
is fast batch tiff
to jpeg
converter that preserve the folder structure. It purposefully designed to convert hundred of thousand tiff
images with intricate folder structure, usually comes from microscope raw data such as Mica (Leica), Cell-IQ (Yokogawa), Incucyte (Sartorius), and other microscope system! The script utilize multithreading and other optimization, and can run with >200 iteration/second.
Imagine you have the folder structure below. Each subfolder contain a lot subfolder (usually from 96 well plate name schema). Inside of each 96 well, you have thousand of tiff
files. Then, you want to convert into jpeg
in a new parent folder.
cancer_treated_2023.08.08/
├─ plate_01/
│ ├─ A01_1/
│ ├─ A02_1/
│ ├─ A03_1/
│ ├─ ...
├─ plate_02/
│ ├─ A01_1/
│ ├─ A02_1/
│ ├─ A03_1/
│ ├─ ...
├─ plate_03/
│ ├─ A01_1/
│ ├─ A02_1/
│ ├─ A03_1/
│ ├─ ...
├─ ...
- Setup: Ensure
tqdm
andimageio
libraries are installed. - Directories: Modify
input_dir
andoutput_dir
variables to designate your directories containing.TIFF
images and the desired output location for.JPEGs
, respectively. - Executor Selection: Based on workload characteristics and system features, you can opt between
ThreadPoolExecutor
andProcessPoolExecutor
. The default executor in the provided code is set to threads. To switch, uncomment the relevant lines in the code. - Adjusting Workers: Tweak the
max_workers
parameter based on system capacity and performance observations. A recommended starting point is the number of available CPU threads. - Adjust JPEG Quality: You can change the JPEG quality. 95 is recommended, but you can go up to 100, why not?
You can see the tiff2jpeg in action below, converting 30+ GB, 7700+ tiff image files in less than one minute.
This code performs image format conversion, changing .TIFF
files to .JPEG
, and is optimized for speed using several techniques:
Instead of repetitively checking and potentially creating a directory for every image, we maintain a set of directories that have already been created. This reduces redundant filesystem interactions.
Python's built-in concurrent.futures.ThreadPoolExecutor
is employed to process multiple images concurrently, leveraging multiple CPU threads. Such a parallel processing mechanism is ideal for I/O-bound tasks. When one thread is waiting for I/O tasks to conclude, other threads can continue their processing tasks.
An alternative to threading, the code is also designed to use Python's ProcessPoolExecutor
for parallel processing. This allows for distribution of work among multiple CPU cores. Though this approach can be more suitable for CPU-intensive tasks, it generally has a higher overhead compared to threading. Therefore, the choice between threading and multiprocessing will depend on specific workload and environmental parameters.