Non-negative matrix Factorisation of Tractography data

This is a repo merging NFacT (Shaun Warrington, Ellie Thompson, and Stamatios Sotiropoulos) and ptx_decomp (Saad Jbabdi and Rogier Mars).

What is NFACT

NFACT (Non-negative matrix Factorisation of Tractography data) is a set of modules (as well as an end to end pipeline) that decomposes tractography data using NMF/ICA.

It consists of three "main" decomposition modules:

- nfact_pp (Pre-process data for decomposition)

- nfact_decomp (Decomposes a single or average group matrix using NMF or ICA)

- nfact_dr (Dual regression on group matrix)

as well as three axillary "modules":

- nfact_config (creates config files for the pipeline and changing any hyperparameters)

- nfact_Qc (Creates hitmaps to check for bias in decomposition)

- nfact_glm (To run hypothesis testing)

and a pipeline wrapper

- nfact (runs either all three pre-processing modules or just nfact_decomp and nfact_dr)

NFACT pipeline

This pipeline runs nfact_pp, nfact_decomp and nfact_dr on tractography data that has been processed by bedpostx.

The pipeline first creates the omatrix2


usage: nfact [-h] [-l LIST_OF_SUBJECTS] [-s SEED [SEED ...]] [-c CONFIG] [-S] [-i REF] [-b BPX_PATH] [-w WARPS [WARPS ...]] [-r ROIS [ROIS ...]] [-t TARGET2] [-d DIM] [-o OUTDIR] [-a ALGO]

  -h, --help            show this help message and exit

  -l LIST_OF_SUBJECTS, --list_of_subjects LIST_OF_SUBJECTS
                        Filepath to a list of subjects.
  -s SEED [SEED ...], --seed SEED [SEED ...]
                        A single or list of seeds
  -c CONFIG, --config CONFIG
                        An nfact_config file. If this is provided no other arguments are needed.
  -S, --skip            Skips NFACT_PP. Pipeline still assumes that NFACT_PP has been ran before.

  -i REF, --image_standard_space REF
                        Standard space reference image
  -b BPX_PATH, --bpx BPX_PATH
                        Path to Bedpostx folder inside a subjects directory.
  -w WARPS [WARPS ...], --warps WARPS [WARPS ...]
                        Path to warps inside a subjects directory (can accept multiple arguments)
  -r ROIS [ROIS ...], --rois ROIS [ROIS ...]
                        A single or list of ROIS
  -t TARGET2, --target TARGET2
                        Path to target image. If not given will create a whole mask from reference image

  -d DIM, --dim DIM     Number of dimensions/components
  -o OUTDIR, --outdir OUTDIR
                        Path to where to create an output folder
  -a ALGO, --algo ALGO  What algorithm to run. Options are: ICA (default), or NMF.

example call:

nfact --list_of_subject /absolute path/sub_list \
--seed thalamus.nii.gz \
--algo NMF \
--dim 100 \
--outdir /absolute path/save directory \
--warps standard2acpc_dc.nii.gz acpc_dc2standard.nii.gz \
--ref $FSLDIR/data/standard/MNI152_T1_2mm_brain.nii.gz \
--bpx Diffusion.bedpostX 

With a config file:

nfact –config /absolute path/nfact_config.config  

Pre-processing of tractgraphy data for decomposition with NFacT (Non-negative matrix Factorisation of Tractography data)

Under the hood NFACT PP is probtrackx2 omatrix2 option to get a seed by target connectivity matrix

Input for nfact_preproc

Required before runing NFACT PP: - crossing-fibre diffusion modelled data (bedpostX) - Seeds (either surfaces or volumes)

NFACT PP has three streams, surface seed, volume, mode and filestree.

Required input: - List of subjects - Output directory

Input needed for filestree mode: - .tree file (NFACT_PP comes with some defaults such as hcp)

Input needed for both surface and volume mode: - Seeds path inside folder - Warps path inside a subjects folder - bedpostx folder path inside a subjects folder

Input for surface seed mode: - Seeds as surfaces - ROIs as surfaces (medial wall)

Input needed for volume mode: - Seeds as volumes

NFACT PP input folder

NFACT pp can be used in a folder agnostic way by providing the paths to seeds/bedpostX/target inside a subject folder (i.e --seeds seeds/amygdala.nii.gz).

The other way is to use the --file_tree command with the name of a file tree (see https://open.win.ox.ac.uk/pages/fsl/file-tree/index.html for further details on filetree). In this case seeds/rois/bedpostx do not need to be specified as nfact_pp will try and find the appriopriate files.

nfact_pp --file_tree hcp --list_of_subjects /home/study/list_of_subjects

Filetrees are saved in filetrees folder in nfact, so custom filetrees can be put there and called similar to the command above. NFACT_PP currently has a built in a filetree for HCP (from qunex output) to perform full brain tractography.

Use of custom filetree

seed files are aliased as (seed), medial wall as (medial_wall), warps as (diff2std, std2diff) and bedpostX as (bedpostX). Two seeds are supported if the seeds are bilateral indicated with {hemi}.seed, with the actual seed names being L.seed.nii.gz/R.seed.nii.gz. A singe seed can be given as well.


usage: nfact_pp [-h] [-hh] [-O] [-l LIST_OF_SUBJECTS] [-o OUTDIR] [-f FILE_TREE] [-s SEED [SEED ...]] [-w WARPS [WARPS ...]] [-b BPX_PATH] [-m MEDIAL_WALL [MEDIAL_WALL ...]] [-i REF] [-t TARGET2] [-N NSAMPLES] [-mm MM_RES] [-p PTX_OPTIONS] [-e EXCLUSION]
                [-S [STOP ...]] [-n N_CORES] [-C] [-cq CLUSTER_QUEUE] [-cr CLUSTER_RAM] [-ct CLUSTER_TIME] [-cqos CLUSTER_QOS]

  -h, --help            show this help message and exit
  -hh, --verbose_help   Prints help message and example usages
  -O, --overwrite       Overwrite previous file structure

Compulsory Arguments:
  -l LIST_OF_SUBJECTS, --list_of_subjects LIST_OF_SUBJECTS
                        A list of subjects in text form. If not provided NFACT PP will use all subjects in the study folder. All subjects need full file path to subjects directory
  -o OUTDIR, --outdir OUTDIR
                        Directory to save results in

  -f FILE_TREE, --file_tree FILE_TREE
                        Use this option to provide name of predefined file tree to perform whole brain tractography. NFACT_PP currently comes with HCP filetree. See documentation for further information.

Tractography options: :
  -s SEED [SEED ...], --seed SEED [SEED ...]
                        A single or list of seeds
  -w WARPS [WARPS ...], --warps WARPS [WARPS ...]
                        Path to warps inside a subjects directory (can accept multiple arguments)
  -b BPX_PATH, --bpx BPX_PATH
                        Path to Bedpostx folder inside a subjects directory.
                        REQUIRED FOR SURFACE MODE: Medial wall file. Use when doing whole brain surface tractography to provide medial wall.
  -i REF, --ref REF     Standard space reference image. Default is $FSLDIR/data/standard/MNI152_T1_2mm_brain.nii.gz
  -t TARGET2, --target TARGET2
                        Name of target. If not given will create a whole mask from reference image
  -N NSAMPLES, --nsamples NSAMPLES
                        Number of samples per seed used in tractography (default = 1000)
  -mm MM_RES, --mm_res MM_RES
                        Resolution of target image (Default = 2 mm)
  -p PTX_OPTIONS, --ptx_options PTX_OPTIONS
                        Path to ptx_options file for additional options
  -e EXCLUSION, --exclusion EXCLUSION
                        Path to an exclusion mask. Will reject pathways passing through locations given by this mask
  -S [STOP ...], --stop [STOP ...]
                        Use wtstop and stop in the tractography. Takes a file path to a json file containing stop and wtstop masks, JSON keys must be stopping_mask and wtstop_mask. Argument can be used with the --filetree, in that case no json file is needed.

Parallel Processing arguments:
  -n N_CORES, --n_cores N_CORES
                        If should parallel process and with how many cores

Cluster Arguments:
  -C, --cluster         Use cluster enviornment
                        Cluster queue to submit to
  -cr CLUSTER_RAM, --cluster_ram CLUSTER_RAM
                        Ram that job will take. Default is 60
  -ct CLUSTER_TIME, --cluster_time CLUSTER_TIME
                        Time that job will take. nfact_pp will assign a time if none given
  -cqos CLUSTER_QOS, --cluster_qos CLUSTER_QOS
                        Set the qos for the cluster

Example Usage:
    Seed mode:
           nfact_pp --list_of_subjects /home/study/sub_list
               --outdir /home/study
               --bpx_path /path_to/.bedpostX
               --seeds /path_to/L.white.32k_fs_LR.surf.gii /path_to/R.white.32k_fs_LR.surf.gii
               --rois /path_to/L.atlasroi.32k_fs_LR.shape.gii /path_to/R.atlasroi.32k_fs_LR.shape.gii
               --warps /path_to/stand2diff.nii.gz /path_to/diff2stand.nii.gz
               --n_cores 3

    Volume mode:
            nfact_pp --list_of_subjects /home/study/sub_list
                --bpx_path /path_to/.bedpostX
                --seeds /path_to/L.white.nii.gz /path_to/R.white.nii.gz
                --warps /path_to/stand2diff.nii.gz /path_to/diff2stand.nii.gz
                --ref MNI152_T1_1mm_brain.nii.gz
                --target dlpfc.nii.gz

    Filestree mode:
        nfact_pp --filestree hcp
            --list_of_subjects /home/study/sub_list
            --outdir /home/study
            --n_cores 3


## NFACT decomp
This is the main decompoisition module of NFACT. Runs either ICA or NMF and saves the components. Components can also be normalised and winner takes all maps

### Usage

usage: nfact [-h] [-l LIST_OF_SUBJECTS] [-o OUTDIR] [-d DIM] [--seeds SEEDS] [-m MIGP] [-a ALGO] [-W] [-z WTA_ZTHR] [-N] [-S] [-O] [-c CONFIG]

options: -h, --help show this help message and exit -l LIST_OF_SUBJECTS, --list_of_subjects LIST_OF_SUBJECTS REQUIRED: Filepath to a list of subjects. List can contain a single subject. -o OUTDIR, --outdir OUTDIR REQUIRED: Path to output folder -d DIM, --dim DIM REQUIRED: Number of dimensions/components --seeds SEEDS, -s SEEDS REQUIRED: File of seeds used in NFACT_PP/probtrackx -m MIGP, --migp MIGP MELODIC's Incremental Group-PCA dimensionality (default is 1000) -a ALGO, --algo ALGO What algorithm to run. Options are: ICA (default), or NMF. -W, --wta Save winner-takes-all maps -z WTA_ZTHR, --wta_zthr WTA_ZTHR Winner-takes-all threshold (default=0.) -N, --normalise normalise components by scaling -S, --sign_flip sign flip components -O, --overwrite Overwrite previous file structure. Useful if wanting to perform multiple GLMs or ICA and NFM -c CONFIG, --config CONFIG Provide config file to change hyperparameters for ICA and NFM. Please see sckit learn documentation for NFM and FASTICA for further details

An example call

nfact_decomp --list_of_subjects /absolute path/sub_list
--seeds /absolute path/seeds.txt
--outdir /absolute path/study_directory
--algo ICA
--migp 1000
--dim 100 --normalise --wta –sign_flip \


This is the dual regression module of NFACT. Depending on which decompostion method was used depends on which 
dual regression technique will be used. If NMF was used then non-negative least squares regression will be used, if ICA
then it will be standard regression.

### Usage

usage: nfact_dr [-h] [-l LIST_OF_SUBJECTS] [-o OUTDIR] [-a ALGO] [--seeds SEEDS] [-n NFACT_DECOMP_DIR] [-d DECOMP_DIR] [-N]

options: -h, --help show this help message and exit -l LIST_OF_SUBJECTS, --list_of_subjects LIST_OF_SUBJECTS REQUIRED: Filepath to a list of subjects -o OUTDIR, --outdir OUTDIR REQUIRED: Path to output directory -a ALGO, --algo ALGO REQUIRED: Which NFACT algorithm to perform dual regression on --seeds SEEDS, -s SEEDS REQUIRED: File of seeds used in NFACT_PP/probtrackx -n NFACT_DECOMP_DIR, --nfact_decomp_dir NFACT_DECOMP_DIR REQUIRED IF NFACT_DECOMP: Filepath to the NFACT_decomp directory. Use this if you have ran NFACT decomp -d DECOMP_DIR, --decomp_dir DECOMP_DIR REQUIRED IF NOT NFACT_DECOMP: Filepath to decomposition components. WARNING NFACT decomp expects components to be named in a set way. See documentation for further info. -N, --normalise normalise components by scaling

nfact_dr is independent from nfact_decomp however, nfact_decomp expects a strict naming convention of files. If nfact_decomp has not been ran then group average files and components must all be in the same folder. Components must be named W_dim* and G_dim* with group average files named coords_for_fdt_matrix2, lookup_tractspace_fdt_matrix2.nii.gz. 


This is a qulaity control module that creates a number of hitmaps that can be used to check for bias in decomposition.

Each map contains the number of times that voxel/vertex appears in the decomposition. 

## Output:

- hitmap_*.nii.gz: Volume nii component. Components are thresholded by zscoring to remove noise
- hitmap_*_raw.nii.gz: Volume nii component. Components are not thresholded
- mask_*.nii.gz: Volume nii component. Binary mask of thresholded components
- mask_*_raw.nii.gz: Volume nii component. Binary mask of unthresholded components     
- *.gii: Surface gii component. Components are thresholded by zscoring to remove noise
- *_raw.gii: Surface gii component. Components are not thresholded   

## Usage:

usage: nfact [-h] [-n NFACT_FOLDER] [-d DIM] [-a ALGO] [-t THRESHOLD] [-O]

options: -h, --help show this help message and exit -n NFACT_FOLDER, --nfact_folder NFACT_FOLDER REQUIRED: Path to nfact output folder -d DIM, --dim DIM REQUIRED: Number of dimensions/components -a ALGO, --algo ALGO REQUIRED:What algorithm to qc. Options are: NMF (default), or ICA. -t THRESHOLD, --threshold THRESHOLD Threshold value for z scoring the normalised image -O, --overwrite Overwite previous QC


## NFACT config

NFACT config is a util tool for nfact, that creates a variety of config files to be used in nfact.

NFACT config can create:
1) nfact_config_pipeline.config overview. This config json file is used in the nfact pipeline to have greater control over parameters.  
2) nfact_config_decomp.config. A config file to control the hypereparameters of the ICA and NMF functions.
3) nfact_config_sublist. A list of subjects in a folder. 

## Usage:

usage: nfact_config [-h] [-C] [-D] [-s SUBJECT_LIST] [-o OUTPUT_DIR]

options: -h, --help show this help message and exit -C, --config Creates a config file for NFACT pipeline -D, --decomp_only Creates a config file for sckitlearn function hyperparameters -s SUBJECT_LIST, --subject_list SUBJECT_LIST Creates a subject list from a given directory -o OUTPUT_DIR, --output_dir OUTPUT_DIR Where to save config file

Altering a boolean value in a json is done by giving then everything has to be lower case i.e true, false. It is advised that unless you are familar with json 
files to use a json linter to check they are valid. 

### nfact_config_pipeline.config overview

This is the config file for the nfact pipeline. Please check the individual modules for further details on arguments.

{ "global_input": { "list_of_subjects": "Required", "outdir": "Required", "seed": [ "Required unless file_tree specified" ], "overwrite": false, "skip": false }, "nfact_pp": { "warps": [], "bpx_path": false, "rois": [], "file_tree": false, "ref": false, "target2": false, "nsamples": "1000", "mm_res": "2", "ptx_options": false, "n_cores": false, "cluster": false }, "nfact_decomp": { "dim": "Required", "migp": "1000", "algo": "ICA", "wta": false, "wta_zthr": "0.0", "normalise": false, "sign_flip": false, "config": false }, "nfact_dr": { "normalise": false } }

Everything that has says is required must be given. rois, warps and seed must be given in python list format like this

"seed": ["l_seed.nii.gz", "r_seed.nii.gz]

### nfact_config_decomp.config 

This is the nfact_config_decomp.config file.

NFACT does its decomposition using sckit learn's FastICA (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html#sklearn.decomposition) and NFM (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html) so any of the hyperparameters of these functions can be altered by changing the values in the json file.

{ "ica": { "algorithm": "parallel", "whiten": "unit-variance", "fun": "logcosh", "fun_args": null, "max_iter": 200, "tol": 0.0001, "w_init": null, "whiten_solver": "svd", "random_state": null }, "nmf": { "init": null, "solver": "cd", "beta_loss": "frobenius", "tol": 0.0001, "max_iter": 200, "random_state": null, "alpha_W": 0.0, "alpha_H": "same", "l1_ratio": 0.0, "verbose": 0, "shuffle": false } }

### nfact_config_sublist

NFACT config will attempt to given a directory work out and write to a file all the subjects in that file. Though nfact will try and filter out 
folders that aren't subjects, it isn't perfect so please check the subject list. 


This is currently a work in progress module. The aim is to support hypothesis testing between groups.