Pet Pipeline: Non-Selective Reprocessing of Eligible Bids Entries

Describe the bug
The pet pipeline is not intelligently skipping already processed bids entries, unlike the t1 pipeline. This results in the pet pipeline reprocessing all eligible bids entries.

To Reproduce
Steps to reproduce the behavior:

Execute the pet pipeline on ADNI data, e.g. using the 18FFV45 tracer with cerebellumPons2 suvr.
Do it again on the same BIDS and CAPS folders.
Notice that it reprocesses all eligible bids entries, regardless of whether they have been processed before.

Expected behavior
The pet pipeline should intelligently skip already processed bids entries, similar to the behavior of the t1 pipeline.

Screenshots
N/A

Desktop:

OS: Ubuntu 20.04
Clinica: dev version 0.7.7

Additional context
This issue affects the efficiency and performance of the pet pipeline. Implementing intelligent skipping of already processed bids entries will significantly improve its performance and reduce redundant processing.

Hi @souravraha

Thanks for pointing this out. I'm not sure to fully understand what you mean though.
I haven't looked deeply into it, but as long as you provide a working directory to these pipelines (and use the same working directory the second time you run the pipeline) Nipype should be clever enough to not re-run the computations. If you look at the logs, you should see that it is using cached results.
I don't believe Clinica explicitly implements a caching mechanism other than this one (might have to double check that...).

Could you share the commands that you executed ?

@NicolasGensollen
Upon re-running the command:

clinica run pet-linear --save_pet_in_t1w_space -wd /DATA/user/tmp/ bids CAPS 18FAV45 cerebellumPons2 -tsv partial_list.tsv

I noticed that the PET pipeline processes each BIDS subject sequentially, despite having previously executed the pipeline.

In contrast, the T1 pipeline issues a warning indicating that each BIDS subject has already been processed and promptly skips them, resulting in a significantly shorter re-execution time, typically completing within a few seconds.

@souravraha I think you're right.

There is some logic implemented in the AnatLinearPipeline which looks for already processed images and skip them:

clinica/clinica/pipelines/t1_linear/anat_linear_pipeline.py

Lines 136 to 153 in 2939b05

    
           processed_ids = self.get_processed_images( 
        
               self.caps_directory, self.subjects, self.sessions 
        
           ) 
        
           if len(processed_ids) > 0: 
        
               cprint( 
        
                   msg=f"Clinica found {len(processed_ids)} image(s) already processed in CAPS directory:", 
        
                   lvl="warning", 
        
               ) 
        
               for image_id in processed_ids: 
        
                   cprint(msg=f"{image_id.replace('_', ' | ')}", lvl="warning") 
        
               cprint(msg=f"Image(s) will be ignored by Clinica.", lvl="warning") 
        
               input_ids = [ 
        
                   f"{p_id}_{s_id}" for p_id, s_id in zip(self.subjects, self.sessions) 
        
               ] 
        
               to_process_ids = list(set(input_ids) - set(processed_ids)) 
        
               self.subjects, self.sessions = extract_subjects_sessions_from_filename( 
        
                   to_process_ids 
        
               )

This is based on the implementation of this method:

clinica/clinica/pipelines/t1_linear/anat_linear_pipeline.py

Lines 28 to 41 in 2939b05

    
           def get_processed_images( 
        
               caps_directory: Path, subjects: List[str], sessions: List[str] 
        
           ) -> List[str]: 
        
               from clinica.utils.filemanip import extract_image_ids 
        
               from clinica.utils.input_files import T1W_LINEAR_CROPPED 
        
               from clinica.utils.inputs import clinica_file_reader 
        
               image_ids: List[str] = [] 
        
               if caps_directory.is_dir(): 
        
                   cropped_files, _ = clinica_file_reader( 
        
                       subjects, sessions, caps_directory, T1W_LINEAR_CROPPED, False 
        
                   ) 
        
                   image_ids = extract_image_ids(cropped_files) 
        
               return image_ids

Which has an abstract definition in the engine, but is not implemented by all pipelines (for example PET pipelines do not implement this).

What's even more strange is that some pipelines (like DWIPreprocessingUsingT1) seem to implement the method but have no skipping logic when reading input files...

I think we should definitely fix this and offer a similar user experience for all pipelines. I'll add this to my todo list 😅

@NicolasGensollen

While discussing this matter, I'd like to revisit a previous issue, #1060, which you helped resolve. After incorporating your enhancements and executing the converter on the existing bids directory, I encountered the same errors mentioned in #1060. To resolve this, I delved deeper and identified files with troublesome suffixes ("ADC", "real") within the bids directory. These files stemmed from an earlier problematic version of the converter. Upon removing these older files, I was able to successfully execute the converter with your enhancements.

It appears that any improvements implemented may not be effective until the problematic files are removed from the disk. This issue could potentially be mitigated if the logic initially detected such problematic files. I wanted to bring this to your attention for your awareness.

This issue is considered stale because it has not received further activity for the last 14 days. You may remove the inactive label or add a comment, otherwise it will be closed after the next 14 days.

	processed_ids = self.get_processed_images(
	self.caps_directory, self.subjects, self.sessions
	)
	if len(processed_ids) > 0:
	cprint(
	msg=f"Clinica found {len(processed_ids)} image(s) already processed in CAPS directory:",
	lvl="warning",
	)
	for image_id in processed_ids:
	cprint(msg=f"{image_id.replace('_', ' \| ')}", lvl="warning")
	cprint(msg=f"Image(s) will be ignored by Clinica.", lvl="warning")
	input_ids = [
	f"{p_id}_{s_id}" for p_id, s_id in zip(self.subjects, self.sessions)
	]
	to_process_ids = list(set(input_ids) - set(processed_ids))
	self.subjects, self.sessions = extract_subjects_sessions_from_filename(
	to_process_ids
	)

	def get_processed_images(
	caps_directory: Path, subjects: List[str], sessions: List[str]
	) -> List[str]:
	from clinica.utils.filemanip import extract_image_ids
	from clinica.utils.input_files import T1W_LINEAR_CROPPED
	from clinica.utils.inputs import clinica_file_reader

	image_ids: List[str] = []
	if caps_directory.is_dir():
	cropped_files, _ = clinica_file_reader(
	subjects, sessions, caps_directory, T1W_LINEAR_CROPPED, False
	)
	image_ids = extract_image_ids(cropped_files)
	return image_ids