This repository houses a utility to make BIDS TSV and JSON files out of ABCD Release data. It is currently tested and working on the NIH helix
system with the ABCD Release 4.0 data.
I feel it's important to forewarn about this tool's resource consumption. Given about 400 variables/fields in the selections.txt
file in a test performed 5/26/2022, the tool utilized about 45 GB of memory on 1 CPU for 3 hours to produce the output files. If you need 400 or so variables, I recommend performing this on an HPC cluster with 64GB memory, 1 CPU, and 4 hours. Tests with about 30 variables performed with negligible resource consumption.
You can copy all the following text in the code block and paste it directly into a terminal on the NIH helix
system. As long as you have access to /data/ABCD_DSST/ABCD_BIDS/tabulated_data/release4
, this tabulate.py
code will work.
# clone this repository
git clone https://github.com/ericearl/nda-abcd-tabulate.git
# change into the repository's directory
cd nda-abcd-tabulate
# load the python module
module load python
# make sure pandas is installed
python -m pip install pandas --user
# create the selected tabulated data
python tabulate.py
The tabulate.py
code takes the selections.txt
file as an input list of fields to create a JSON adn TSV out of (not including the defaults: subjectkey
, eventname
, and interview_age
).
Edit the selections.txt
file in place to include only one field per line named after the name that appears in the ABCD_participants.json
file in this same GitHub repository. For example, a selections.txt
file that contained this...
site
sex
... would create as output a TSV file with the five columns: participant_id
, eventname
, interview_age
, site
, and sex
.
Known or discovered issues with this repository's files or functionality should be reported here in this repository's Issues page. Some are already known and are a work in progress, such as:
- Choose any file on the system as input instead of only
selections.txt
- Choose any folder on the system to output the BIDS TSV and JSON files instead of only here as
abcd.{tsv,json}
- Choose your own ABCD Release 4.0 folder as input for finding the folder of NDA ABCD Release TXT file data
- Error-check and graceful fail for any and all mis-typed field selections
- Suggest a similarly named field based on available fields if an incorrect field name is entered
- Suppress expected and unconcerning warning messages
Thanks to Kathy, Shau-Ming, Shane, Dustin, Adam, and the rest of the team for inspiring this improvement to the ABCD NDA tabulated data experience. Work provided by Eric Earl and the NIMH Data Science & Sharing Team.