/diagnostics-Lemanics

Lemanics' team project for the Nipraxis course

Primary LanguagePython

Lemanics team project

Repository containing the Lemanics' team project for the 2022 edition of the nipraxis course.

Table of contents

Diagnostics project

Goal

The purpose of this project is to implement a framework to detect outliers. This README file has instructions on how to get, validate and process the data.

Guidelines

  • Scripts go in the scripts directory.

  • Library code (Python modules) goes in the findoutlie directory.

  • You should put the code in this findoutlie directory on your Python PATH.

Project plan

Group

Work by group of 2 or 3 so to facilitate peer reviewing (please add your group below, the best might be to mix python level of coding):

  • group 1 : ....
  • group 2 : ....

How to work on git - reminder: for each feature, create a branch, merge to the origin etc.

Metrics

The idea is for now to pick two metrics (more if we have time, 1 metric per group) and then to compute a final score.

For the metric, the idea are - for now :

this part below was added by Soraya, let me know what you think and feel free to add your metrics. I think picking SNR and Shannon entropy could be nice.Shannon will be more difficult to implement. Then if we have time we can pick other one.

  • Metrics based on noise measurements - pick 1 metric

    • Classic signal to noise ratio (SNR) : There are 4 approaches for SNR : 1) the pixel-by-pixel standard deviation (SD) in multiple repeated acquisitions; 2) the signal statistics in a difference image; and 3) and 4) the statistics in two separate regions of a single image employing either the mean value or the SD of background noise Dietrich 2007. There is also the temporal variation tSNR which is the average BOLD signal across time divided by the temporal deviation map Kruger 2001
  • Metrics based on spatial information

    • Shannon entropy of voxel intensities for bluriness and ghosting Atkinson1997. Lower values are better. Cf "Shannon entropy H was calculated in each voxel independently (i.e. using the voxel probability distribution obtained by standard histogram method)", see the full formula in the article below DiNuzzo 2003
  • Metrics based on temporal information

    • DVARS: rate of change per frame (can be spatial or temporal,cf practical)
    • tSNR: temporal SNR cf below in SNR

--> Updating & using findoutlie/metric.py/metric_name

Distribution

The distribution of all the metrics in regards to all the database should be plot to faciliate the detection of outliers (Boxplots per metrics ?).

--> Updating & using findoutlie/outfind.py/detect_outliers --> should use the findoutlie/detectors.py/iqr_detector which detect outliers in measures using interquartile range --> then outfind.py will be used by scripts/find_outliers.py to print the list of outliers

--> Therefore the function to plot the distribution should be in scripts/find_outliers.py

Computation of a final score

How to ? - to fill

Independent metric, an array ? Or different weight should be attributed per metric ?

NB - Ratings

Ratings on : • the quality of your outlier detection as assessed by the improvement in the statistical testing for the experimental model after removing the outliers; • the generality of your outlier detection as assessed by the improvement in the statistical testing for the experimental model after removing the outliers, for another similar dataset; • the quality of your code; • the quality and transparency of your process, from your interactions on github; the quality of your arguments about the scans rejected as outliers.

Installation & Usage

Get the data

cd data
curl -L https://figshare.com/ndownloader/files/34951602 -o group_data.tar
tar xvf group_data.tar
cd ..

Check the data

python3 scripts/validate_data.py data

Find outliers

python3 scripts/find_outliers.py data

This should print output to the terminal of form:

<filename>, <outlier_index>, <outlier_index>, ...
<filename>, <outlier_index>, <outlier_index>, ...

Where <filename> is the name of the image that has outlier scans, and <outlier_index> is an index to the volume in the 4D image that you have identified as an outlier. 0 refers to the first volume. For example:

data/group-01/sub-01/func/sub-01_task-taskzero_run-01_bold.nii.gz, 3, 21, 22, 104
data/group-01/sub-01/func/sub-01_task-taskzero_run-02_bold.nii.gz, 11, 33, 91
data/group-01/sub-03/func/sub-03_task-taskzero_run-02_bold.nii.gz, 101, 102, 132
data/group-01/sub-08/func/sub-08_task-taskzero_run-01_bold.nii.gz, 0, 1, 2, 166, 167
data/group-01/sub-09/func/sub-08_task-taskzero_run-01_bold.nii.gz, 3