ME-ICA/aroma

Produce BIDS Derivatives-compatible outputs

tsalo opened this issue · 8 comments

tsalo commented

We have a set of outputs of interest from AROMA, including:

  • Denoised 4D image, in native and standard space (possibly)
  • Metrics (feature scores)
  • Component time series
  • Component maps, in native and standard space (possibly)
  • Classifications (included with feature scores in "classification_overview.txt")
  • Figure

We should figure out how best to make those outputs BIDS Derivatives-compatible.

At minimum, I think we can use filenames that are basically BIDS-ish, like what I propose in ME-ICA/tedana#574. This means using entities and suffixes that match BIDS convention, minus the "source entities" from the original files (e.g., sub, ses, run).

I don't know if we want to output the classifications/metrics as a json (as in tedana) or a tsv. A tsv would be easier to read...

tsalo commented

I'm currently leaning toward the following outputs:

  • desc-ICA_decomposition.json
  • desc-ICA_mixing.tsv
  • desc-ICA_components.nii.gz
  • desc-ICAGammaThresholded_components.nii.gz
  • desc-smoothAROMAnonaggr_bold.nii.gz
  • AROMAnoiseICs.csv (if we keep them)
  • desc-AROMA_metrics.tsv (instead of the json)
  • desc-AROMA_metrics.json (metadata about metrics)
  • aroma.svg

What about the folder structure? My guess is the users will run it in their data folder, say sub-001/ses-01/func/. We should create a derivatives folder or something right?

tsalo commented

Since we're planning to use this as a node in a BIDS App (e.g., fMRIPrep), I think we should dump things into an arbitrarily-named output folder. We can always develop a BIDS App version of the CLI at some point, if there's a demand. It's in the BIDS App version where I think we'd want to try to extract subject and session information, and then develop a folder and filename structure around that.

Since we're planning to use this as a node in a BIDS App (e.g., fMRIPrep), I think we should dump things into an arbitrarily-named output folder.

Yes, it is trivial then to have some manager code that transfers everything to the right position and applies bids-derivatives naming.

I'm currently leaning toward the following outputs:

  • desc-ICA_decomposition.json
  • desc-ICA_mixing.tsv
  • desc-ICA_components.nii.gz
  • desc-ICAGammaThresholded_components.nii.gz
  • desc-smoothAROMAnonaggr_bold.nii.gz
  • AROMAnoiseICs.csv (if we keep them)
  • desc-AROMA_metrics.tsv (instead of the json)
  • desc-AROMA_metrics.json (metadata about metrics)
  • aroma.svg

Yes, don't make it overly complicated - this should do!

@tsalo @oesteban your comments sound good to me!

tsalo commented

Here is what I'm thinking the contents of the files should look like (cross-posted and lightly adapted from ME-ICA/tedana#649):

desc-ICA_mixing.tsv (required)

ica_00 ica_01 ica_02 ica_03 ica_04
0 1.96128 0.715816 1.81138 2.75372 1.84941
1 0.15217 -0.85096 -0.101651 0.147915 -0.363037
2 0.326455 -0.392816 1.1161 0.450553 0.580335
3 0.0141179 -0.871712 1.02556 0.678525 -0.0628587
4 -0.0459112 -1.02132 1.10107 0.306312 -0.400553

desc-ICA_decomposition.json (required)

{
    "Method": "Independent components analysis with MELODIC ICA algorithm implemented by FSL. Components are sorted by variance explained in descending order.",
    "ica_00": {
        "Description": "ICA fit to dimensionally-reduced data.",
        "Method": "AROMA"
    },
    "ica_01": {
        "Description": "ICA fit to dimensionally-reduced data.",
        "Method": "AROMA"
    },
    "ica_02": {
        "Description": "ICA fit to dimensionally-reduced data.",
        "Method": "AROMA"
    },
    "ica_03": {
        "Description": "ICA fit to dimensionally-reduced data.",
        "Method": "AROMA"
    },
    "ica_04": {
        "Description": "ICA fit to dimensionally-reduced data.",
        "Method": "AROMA"
    }
}

desc-AROMA_metrics.tsv

Component max_RP_corr HFC edge_fract csf_fract classification rationale
ica_00 0.2 0.8163 0.507241 0.00403574 rejected hfc
ica_01 0.43 0.1357 0.569411 0.00297853 accepted n/a
ica_02 0.7 0.4905 0.33585 0.00195625 rejected hyperplane
ica_03 0.1 0.7926 0.517262 0.00306524 accepted n/a
ica_04 0.25 0.2831 0.32673 0.00231219 accepted n/a

desc-AROMA_metrics.json

{
    "Component": {
        "Description": "The unique identifier of each component. This identifier matches column names in the mixing matrix TSV file.",
        "LongName": "Component identifier"
    },
    "classification": {
        "Description": "Classification from the classification procedure.",
        "Levels": {
            "accepted": "A component determined to be unrelated to motion. Included in denoised data.",
            "ignored": "A low-variance component included in denoised data.",
            "rejected": "A motion-related component excluded from denoised data."
        },
        "LongName": "Component classification"
    },
    "max_RP_corr": {
        "Description": "The maximum robust correlation of each component time series with a model of 72 realignment parameters.",
        "LongName": "Maximum motion parameter correlation"
    },
    "HFC": {
        "Description": "The frequency, as fraction of the Nyquist frequency, at which the higher and lower frequencies explain half of the total power between 0.01Hz and Nyquist.",
        "LongName": "High-frequency content"
    },
}

@tsalo This makes sense to me. There's no clear connection between metrics.tsv and mixing.tsv, so if this is something you'll expect to make a standard derivative in the future, I could see that as being something people would ask about. But I'm not sure it's worth trying to predict how that conversation will go before it's happened.

tsalo commented

@effigies Thanks! You're right about the lack of connection. I'm hoping that we can come up with a way to fix that in the long term, but I'm glad this is good for the mean time.