Follow BIDS-Derivatives convention for outputs
tsalo opened this issue · 11 comments
Is your feature request related to a problem? Please describe.
BIDSifying rapidtide outputs will make it easier to integrate rapidtide with fMRIPrep and other BIDS Apps.
Describe the solution you'd like
Generated files could follow BIDS-Derivatives convention, both in terms of the filenames and the file formats. Additionally, it would be great if the documentation included an Outputs of rapidtide
page. Tedana's version uses tables, while fMRIPrep's version is more expansive and text-oriented.
Describe alternatives you've considered
The documentation under "Usage" could be expanded instead of adding a new page just for describing workflow outputs. This might be better for rapidtide, given the sheer number of different workflows that are included in the package. In tedana and fMRIPrep's cases, they only support one or two workflows for which the outputs need documentation.
Additional context
This request stems from nipreps/fmriprep#1678 (comment).
Here are some initial thoughts on the files you outlined in nipreps/fmriprep#1678, based on bids-standard/bids-specification#519:
- _lagregressors.nii.gz -->
X_desc-rapidtide_timeseries.nii.gz
- final regressor timecourse -->
X_desc-rapidtide_timeseries.tsv
(would conflict with the above) - _lagtimes.nii.gz --> Not sure
- _lagstrengths.nii.gz --> Not sure
- _lagsigma.nii.gz -->
X_desc-rapidtide_std.nii.gz
- _lagmask.nii.gz -->
X_desc-rapidtide_mask.nii.gz
I've made a stab at this. There is a new option "--bidsoutput" that should put pretty much everything into BIDS compatible files (the exception are some text files specifying the command that was run.)
The code-base is so huge that I'm having some trouble hunting down all of the new outputs. Are they summarized anywhere (e.g., in the documentation, or listed as targets for a test)? If not, I could probably fetch the dev
branch and run it on some data locally to figure out the final list.
One thing that I didn't see was associated json files. Those JSONs are optional, but they'd be pretty helpful, I think. Specifically, two fields that would be good to have are RawSources
and Units
. The RawSources
would just be the inputs to the files, I think, and then if those inputs are actually not raw BIDS files (or are not final derivatives), then it would be on the tool (e.g., fMRIPrep) to chain the series of RawSources
together to trace back to the actual BIDS files. Some of the files (like the lags) should be in seconds, so having Units
there would be helpful. There are probably a few other common fields that are worth including, but I'd need to read through the outputs to figure that out.
Ok, I've added json sidecars for all the nifti files ending in _map.nii.gz (which are the main output files). I put in RawSources and Units (although they are "None" for anything but lagtimes and lagsigma). I also added a CommandLineArgs, which shows you all the options used to invoke rapidtide.
Based on that test, here are the files:
dgsr_bids_commandline.txt
dgsr_bids_desc-MTT_map.json
dgsr_bids_desc-MTT_map.nii.gz
dgsr_bids_desc-R2_map.json
dgsr_bids_desc-R2_map.nii.gz
dgsr_bids_desc-autocorr_regressor.json
dgsr_bids_desc-autocorr_regressor.tsv.gz
dgsr_bids_desc-corrdistdata_info.json
dgsr_bids_desc-corrdistdata_info.tsv.gz
dgsr_bids_desc-corrout_info.nii.gz
dgsr_bids_desc-failreason_map.json
dgsr_bids_desc-failreason_map.nii.gz
dgsr_bids_desc-fitNorm_map.json
dgsr_bids_desc-fitNorm_map.nii.gz
dgsr_bids_desc-fitR2_hist.json
dgsr_bids_desc-fitR2_hist.tsv.gz
dgsr_bids_desc-fitcoff_map.json
dgsr_bids_desc-fitcoff_map.nii.gz
dgsr_bids_desc-fitmask_map.json
dgsr_bids_desc-fitmask_map.nii.gz
dgsr_bids_desc-fitwindow_info.nii.gz
dgsr_bids_desc-fmrires_regressor.json
dgsr_bids_desc-fmrires_regressor.tsv.gz
dgsr_bids_desc-gaussout_info.nii.gz
dgsr_bids_desc-globallag_hist.json
dgsr_bids_desc-globallag_hist.tsv.gz
dgsr_bids_desc-globalmean_mask.nii.gz
dgsr_bids_desc-lagsigma_map.json
dgsr_bids_desc-lagsigma_map.nii.gz
dgsr_bids_desc-lagstrengths_map.json
dgsr_bids_desc-lagstrengths_map.nii.gz
dgsr_bids_desc-lagtimes_hist.json
dgsr_bids_desc-lagtimes_hist.tsv.gz
dgsr_bids_desc-lagtimes_map.json
dgsr_bids_desc-lagtimes_map.nii.gz
dgsr_bids_desc-lfofiltered_bold.nii.gz
dgsr_bids_desc-maxcorr_hist.json
dgsr_bids_desc-maxcorr_hist.tsv.gz
dgsr_bids_desc-meanvalue_map.json
dgsr_bids_desc-meanvalue_map.nii.gz
dgsr_bids_desc-nullsimfunc_hist.json
dgsr_bids_desc-nullsimfunc_hist.tsv.gz
dgsr_bids_desc-origres_regressor.json
dgsr_bids_desc-origres_regressor.tsv.gz
dgsr_bids_desc-plt0p001_mask.nii.gz
dgsr_bids_desc-plt0p005_mask.nii.gz
dgsr_bids_desc-plt0p010_mask.nii.gz
dgsr_bids_desc-plt0p050_mask.nii.gz
dgsr_bids_desc-processed_mask.nii.gz
dgsr_bids_desc-r2value_map.json
dgsr_bids_desc-r2value_map.nii.gz
dgsr_bids_desc-refine_mask.nii.gz
dgsr_bids_desc-refined_regressor.json
dgsr_bids_desc-refined_regressor.tsv.gz
dgsr_bids_desc-resampres_regressor.json
dgsr_bids_desc-resampres_regressor.tsv
dgsr_bids_desc-resampres_regressor.tsv.gz
dgsr_bids_desc-rvalue_map.json
dgsr_bids_desc-rvalue_map.nii.gz
dgsr_bids_desc-width_hist.json
dgsr_bids_desc-width_hist.tsv.gz
dgsr_bids_formattedcommandline.txt
dgsr_bids_memusage.csv
dgsr_bids_options.json
dgsr_bids_pass-1_desc-despeckle_mask.nii.gz
dgsr_bids_pass-2_desc-despeckle_mask.nii.gz
dgsr_bids_pass-3_desc-despeckle_mask.nii.gz
dgsr_bids_runtimings.txt
A few thoughts I have are:
_regressor
-->_timeseries
: This is best outlined in the open pull request for functional derivatives (bids-standard/bids-specification#519).tsv.gz
-->.tsv
: I believe that gzipped tsv files are standard for highly sampled time series, like physio data, but time series around the resolution of the TR don't generally get gzipped.- I'm not really sure what the difference between
desc-rvalue_map
anddesc-R2_map
is. _desc-meanvalue_map
-->_mean
: Themean
suffix is added in bids-standard/bids-specification#519."Units": "None"
-->"Units": "arbitrary"
: This is a recent addition to the specification, since arbitrary units are not part of the standard unit systems used by the specification.- When
SamplingFrequency
is the same as the input fMRI data, you can use the special keyword "TR", which is added in that PR I mentioned above. - In the json files linked to tsv files, the columns should be at the top level (i.e., no
"Columns"
key). The way you have it set up is more likephysio
data, where the tsv wouldn't contain the column names at all, so they have to be set in the json. Inrapidtide
's case, those tsv files are more like fMRIPrep's confounds files (even the upsampled ones), so the column names are in the tsv and the jsons should be structured using the more basic tabular file rules. desc-fitmask_map
-->desc-fit_mask
: You could include mask-specific metadata (see here), especially "Type" to give more info about what a "fit" mask corresponds to. Maybe something like "Brain voxels with sufficient model fit"? I'm not sure if sentences are recommended, since the reserved values are all single words.desc-fitcoff_map
-->desc-fitcoeff_map
: I could be misunderstanding what this is, but I assumed it was parameter estimates from the model?pass
isn't a supported entity, but it seems necessary in this case. I would only recommend adopting a reserved keyword in this case to make it easier to grab the last pass programmatically. Something likepass-final
, maybe?desc-refine_mask
anddesc-refined_regressor
: Should the mask maybe also bedesc-refined
?I've messaged @\effigies about the possibility of using theNever mind, it's a no-go on usingres
entity in time series files. If he says it fits with the prescribed usage, then that could clean up the naming of a lot of the resampled versions of time series here.res
, sinceres
is specifically about spatial resolution. I thinkdesc
is the way to go for these, at the moment.- I think we could also take some cues from
fitlins
/BIDS-Models, which are still in flux, but could be useful for model fit-type stuff. Take a look here. Basically, for the fit coefficients and R^2 values, you could use_stat-effect_statmap
and_stat-rSquare_statmap
, respectively. There's the possibility that that will change, but I think it's a good starting point. - Finally, I unfortunately don't know what several of the maps are, so I don't have any useful feedback on those. Hopefully as I dig into things, I'll gain a better understanding of the outputs.
I hope this helps. I'm not the Derivatives expert among the BIDS maintainers, so I could be wrong about some of these recommendations.
A few thoughts I have are:
- Done
- Done
- This is a result of a bug in mapping internal variable names to file names. Fixed.
- Done
- This is not required, though, right? If one of the tsv files gets separated from the other files, I want to still know what the sample rate is.
- I'll have to look at some examples to figure out what to do here.
- Name changed. Not sure what I'll do about keywords.
- Yes. Just sloppy naming on my part. Fixed.
- Not sure what to do here. I think I'd rather combine the masks into one file.
- Sloppy naming again. I'll go with the change you suggest, but the two things are related, but different.
- Yes, a temporal resolution entity would be good. "sampletime"?
- I'll try that.
- The maps should be renamed to be more informative - these are historical names that I've used for quite some time, and need revision. My current thinking is:
CurrentName | Meaning | NewName |
---|---|---|
lagtimes | time of maximum correlation | maxtime_map |
lagstrengths | maximum correlation value (R) | maxcorr_map (although now correlation is just one of a few similarity metrics) |
lagsigma | halfwidth of correlation peak | maxwidth_map |
R2 | maximum correlation squared | maxcorrsq_map |
MTT | estimate of mean transit time | MTT_map (this is the accepted term for what I'm trying to calculate) |
fitmask | mask showing where the correlation fit succeeded | corrfit_mask |
This is not required, though, right? If one of the tsv files gets separated from the other files, I want to still know what the sample rate is.
Right, the "TR" keyword is entirely optional.
Yes, a temporal resolution entity would be good. "sampletime"?
I've opened an issue about this in the electrophysiology derivatives BEP repository (bids-standard/bep021#2), but, from what I've heard, they will probably be slow to respond. I think it's probably best to use the desc
field to denote the resolution (as you're already doing), with the caveat that there may be a better way in the future.
The maps should be renamed to be more informative - these are historical names that I've used for quite some time, and need revision.
Personally, I think "lag time" is easier to interpret than "maximum correlation time" since it's how users will think of it, but I guess it's a matter of choosing between the interpretation and the actual value.
Ultimately, I think that MTT
will be its own suffix (either _MTT
, _mtt
, or _MTTmap
), since, as you said, it's an accepted term and an established output of certain things. I also noticed that the Units for this file are "arbitrary", but isn't this map in seconds?
One other question- is desc-lfofilterResult_bold.nii.gz
the voxel-wise, lagged LFO confound regressor or the denoised BOLD data after confound regression? If it's the former, then I think a better name would be something like desc-lfoConfound_timeseries.nii.gz
(you can have timeseries.nii.gz
).
Other than that, it all looks very good to me. The _hist
, _info
, and _map
suffixes are not supported, but I think you have found good alternatives that work nicely around the current limits of BIDS Derivatives.
Personally, I think "lag time" is easier to interpret than "maximum correlation time" since it's how users will think of it, but I guess it's a matter of choosing between the interpretation and the actual value.
You may be right - let me think about that.
Ultimately, I think that MTT will be its own suffix (either _MTT, _mtt, or _MTTmap), since, as you said, it's an accepted term and an established output of certain things. I also noticed that the Units for this file are "arbitrary", but isn't this map in seconds?
Yes it is - fixed.
One other question- is desc-lfofilterResult_bold.nii.gz the voxel-wise, lagged LFO confound regressor or the denoised BOLD data after confound regression? If it's the former, then I think a better name would be something like desc-lfoConfound_timeseries.nii.gz (you can have timeseries.nii.gz).
It's the cleaned dataset. This was kind of confusing - I've changed the names a bit. now "desc-lfofilterCleaned.nii.gz" is the name of the denoised BOLD data after confound regression, and "desc-lfofilterRemoved.nii.gz" is the voxel-wise, lagged LFO confound regressor (this file is only saved if the "--nolimitoutput" flag is selected).
I'm working on a table for the documentation with clear explanations of what file is what. To get decent tables, I need to use markdown (can't figure out how to make nice tables in .rst). The file "candidate_usage.md" has the modifications so far.
It's the cleaned dataset. This was kind of confusing - I've changed the names a bit. now "desc-lfofilterCleaned.nii.gz" is the name of the denoised BOLD data after confound regression, and "desc-lfofilterRemoved.nii.gz" is the voxel-wise, lagged LFO confound regressor (this file is only saved if the "--nolimitoutput" flag is selected).
Ah, that makes sense. Thank you for the clarification.
I'm working on a table for the documentation with clear explanations of what file is what. To get decent tables, I need to use markdown (can't figure out how to make nice tables in .rst). The file "candidate_usage.md" has the modifications so far.
Do these tables look alright? If so, I can help with the RST tables.