Code for predicting individual differences in behavioral variables (e.g., intelligence, personality) from resting-state fMRI functional connectivity, using data from the Young Adult Human Connectome Project. The code depends on the HCP data structure. HCP data (MRI, and behavior/demographics) is available from the HCP website (https://www.humanconnectome.org/study/hcp-young-adult)
Authors: Julien Dubois (jcrdubois@gmail.com) and Paola Galdi (paola.galdi@gmail.com)
The code is provided as is, for documentation purposes.
The following files are included:
-
HCP_helpers.py : contains all helper functions and module imports for resting-state fMRI preprocessing, and prediction of behavior
-
personality.ipynb : reproduces analyses in
Dubois, J.*, Galdi, P.*, Han, Y., Paul, L.K. and Adolphs, R. Resting-state functional brain connectivity best predicts the personality dimension of openness to experience. Personality Neuroscience, in press. Preprint: https://www.biorxiv.org/content/early/2018/03/02/215129
- intelligence.ipynb : reproduces analyses in
Dubois, J., Galdi, P., Paul, L.K. and Adolphs, R. A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B, in revision. Preprint: https://www.biorxiv.org/content/early/2018/01/31/257865
- recomputeNEOFFIfactors.ipynb : standalone notebook which can be used to correct NEOFAC_A [agreeableness factor] (see https://www.mail-archive.com/hcp-users@humanconnectome.org/msg05266.html: it appears that one of the test items was not properly reverse coded to compute the agreeableness factor). Data downloaded on 03/08/2018 was found to still be affected by the bug
If you use this code, please cite the most relevant of these two papers.
In order to run the provided code the following packages are needed:
- scipy==0.19.0
- pandas==0.20.1
- matplotlib==2.0.2
- statsmodels==0.8.0
- numpy==1.12.1
- nibabel==2.2.1
- nilearn==0.4.0
- nipype==1.0.1
- seaborn==0.8.1
- scikit_learn==0.19.1
- Customize parameters
- In the notebook in cell Set Parameters
- Launch the pipeline
- In the notebook cells can be executed sequentially (one by one or from the menu Cell->Run All)
Preprocessing is performed executing a sequence of steps or operations, each of which corresponds to a function in the HCP_helpers.py file. There are 7 available operations: Voxel Normalization, Detrending, Motion Regression, Scrubbing, Tissue Regression, Global Signal Regression and Temporal Filtering. The corresponding functions are built using the following template:
def OperationName(niiImg, flavor, masks, imgInfo):
- niiImg it is a list of two elements
[data,volData]
. If the file to be processes is a Nifti file, the variabledata
contains Nifti data and the variablevolData
isNone
. If the input file is a Cifti file, the variabledata
contains Cifti data and the variablevolData
contains volumetric data. Image data can be loaded with functionload_img()
. - flavor is a list containing the Operation parameters (detailed in the following section).
- masks is the output of the function
makeTissueMasks()
, a list of four elements corresponding to 1) whole brain mask, 2) white matter mask, 3) cerebrospinal fluid mask and 4) gray matter mask. - imgInfo is a list of 7 elements corresponding to 1) no. of rows in a slice, 2) no. of columns in a slice, 3) no. of slices, 4) no. of time points, 5) the affine matrix, 6) repetition time and 7) the header of the image. These data can be retrieved with function
load_img()
.
A preprocessing pipeline can be fully described with the following data structure:
[
['Operation1', 1, [param1]],
['Operation2', 2, [param1, param2, param3]],
['Operation3', 3, [param1, param2]],
['Operation4', 4, [param1]],
['Operation5', 5, [param1, param2]],
['Operation6', 6, [param1, param2, param3]],
['Operation7', 7, [param]]
]
It is a list of structures. Each structure is a list of 3 elements:
- The operation name.
- The operation order.
- The list of parameters for the specific operation.
Operations are executed following the operation order specified by the user. Note that in case of regression operations, a single regression step combining multiple regressors (e.g., tissue regressors and global signal regressor) can be requested by assigning the same order to all regression steps.
There are three pipelines already specified in the HCP_helpers.py file.
config.operationDict = {
'A': [ #Finn et al. 2015
['VoxelNormalization', 1, ['zscore']],
['Detrending', 2, ['legendre', 3, 'WMCSF']],
['TissueRegression', 3, ['WMCSF', 'GM']],
['MotionRegression', 4, ['R dR']],
['TemporalFiltering', 5, ['Gaussian', 1]],
['Detrending', 6, ['legendre', 3 ,'GM']],
['GlobalSignalRegression', 7, ['GS']]
],
'B': [ #Satterthwaite et al. 2013 (Ciric7)
['VoxelNormalization', 1, ['demean']],
['Detrending', 2, ['poly', 2, 'wholebrain']],
['TemporalFiltering', 3, ['Butter', 0.01, 0.08]],
['MotionRegression', 4, ['R dR R^2 dR^2']],
['TissueRegression', 4, ['WMCSF+dt+sq', 'wholebrain']],
['GlobalSignalRegression', 4, ['GS+dt+sq']],
['Scrubbing', 4, ['RMS', 0.25]]
],
'C': [ #Siegel et al. 2016 (SiegelB)
['VoxelNormalization', 1, ['demean']],
['Detrending', 2, ['poly', 1, 'wholebrain']],
['TissueRegression', 3, ['CompCor', 5, 'WMCSF', 'wholebrain']],
['TissueRegression', 3, ['GM', 'wholebrain']],
['GlobalSignalRegression', 3, ['GS']],
['MotionRegression', 3, ['censoring']],
['Scrubbing', 3, ['FD+DVARS', 0.25, 5]],
['TemporalFiltering', 4, ['Butter', 0.009, 0.08]]
],
}
- zscore: convert each voxel’s time course to z-score (remove mean, divide by standard deviation).
- demean: substract the mean from each voxel's time series.
Example: ['VoxelNormalization', 1, ['zscore']]
- poly: polynomial regressors up to specified order.
- Specify polynomial order.
- Specify tissue, one of 'WMCSF' or 'GM'.
- legendre: Legendre polynomials up to specified order.
- Specify polynomial order
- Specify tissue, one of 'WMCSF' or 'GM'.
Example: ['Detrending', 2, ['poly', 3, 'GM']]
Note: R = [X Y Z pitch yaw roll]
Note: if temporal filtering has already been performed, the motion regressors are filtered too.
Note: if scrubbing has been requested, a regressor is added for each volume to be censored; the censoring option performs only scrubbing.
- R dR: translational and rotational realignment parameters (R) and their temporal derivatives are used as explanatory variables in motion regression.
- R dR R^2 dR^2:: realignment parameters (R) with their derivatives (dR), quadratic terms (R^2) and square of derivatives (dR^2) are used in motion regression.
- censoring: for each volume tagged for scrubbing, a unit impulse function with a value of 1 at that time point and 0 elsewhere is included as a regressor.
Example: ['MotionRegression', 3, ['R dR']]
Note: this step only flags the volumes to be censored, that are then regressed out in the MotionRegression step.
Note: uncensored segments of data lasting fewer than 5 contiguous volumes, are flagged for removal as well.
Note: the default head radius used for computing FD is 50mm (as in Powers et al. 2012). This value can be changed assigning a new value to config.headradius.
- FD+DVARS
- Specify a threshold for framewise displacement (FD) in mm.
- Specify a threshold t s.t. volumes with a variance of differentiated signal (DVARS) greater than (100 + t)% of the run median DVARS are flagged for removal.
- Specify number of adjacent volumes to exclude (optional).
- RMS
- Specify threshold for root mean square displacement in mm.
- Specify number of adjacent volumes to exclude (optional).
Example: ['Scrubbing', 4, ['FD+DVARS', 0.25, 5]]
- GM: the gray matter signal is added as a regressor.
- Specify if regression should be performed on gray matter signal ('GM') or whole brain signal ('wholebrain')
- WMCSF: white matter and cerebrospinal fluid signals are added as regressors.
- Specify if regression should be performed on gray matter signal ('GM') or whole brain signal ('wholebrain')
- WMCSF+dt+sq: white matter and cerebrospinal fluid signals with their derivatives and quadratic terms are added as regressors.
- Specify if regression should be performed on gray matter signal ('GM') or whole brain signal ('wholebrain')
- CompCor: a PCA-based method (Behzadi et al., 2007) is used to derive N components from CSF and WM signals.
- Specify no. of components to compute for specified tissue mask (see following parameter).
- Specify if components should be computed using a single mask for white matter and cerebrospinal fluid ('WMCSF') or separatedly for white matter and cerebrospinal fluid ('WM+CSF')
- Specify if regression should be performed on gray matter signal ('GM') or whole brain signal ('wholebrain').
Example: ['TissueRegression', 5, ['CompCor', 5, 'WMCSF', 'wholebrain']]
- GS: the global signal is added as a regressor.
- GS+dt+sq: the global signal with its derivative and square terms of global signal and derivative are added as regressors.
Example: ['GlobalSignalRegression', 6, ['GS+dt+sq']]
Note: if scrubbing has been requested, censored volumes are replaced by linear interpolation.
- Butter: Butterworth band pass filtering.
- Specify high pass threshold.
- Specify low pass threshold.
- Gaussian: Low pass Gaussian smoothing.
- Specify standard deviation.
- DCT: Discrete cosine transform.
- Note: this is implemented through regression, in order to be performed in a single regression step together with the removal of other confounds. If this is not the case this step should be performed at the end.
Example: ['TemporalFiltering', 7, ['Butter', 0.009, 0.08]]
- 04/24/2018: Fixed indexing bug in the feature selection step (lines 1389-1391 in HCP_helpers.py). Prior to this date the feature selection step was performed on features indexed by range(1,n_edges) instead of range(0,n_edges), always discarding the first (0-indexed) feature. As a consequence, there might be slight discrepancies in analyses run before and after the fix.