/BiDirect_BIDS_Converter

This repository contains the code, that is used for the neuroimaging conversion (dicom to nii.gz) to the BIDS standard in the institute of epidemiology and social medicine, university of münster.

Primary LanguageRGNU General Public License v3.0GPL-3.0

BiDirect BIDS Converter

DOI

This tool is a project of my PhD. I developed this tool on all the neuroimaging data of the BiDirect study (Institute of Epidemiology and Social Medicine, WWU, Münster, Germany). Tested on Windows and Ubuntu 18.04. Conversion tested with Siemens and Philips data.

It converts MRI data from DICOM to BIDS in three user-interactions (csv editings). Your input: A folder containing your participants DICOMS. Your output: A BIDS specification dataset, that passes the BIDS-Validator.

This tool was developed on the neuroimaging data of the BiDirect study (Institute of Epidemiology and Social Medicine, WWU, Münster, Germany).

=====

Short tutorial:


TIP - If you need a detailed Step-by-Step tutorial how to work with the client, we recommend you to check this Wiki-Tutorial-Page


This tool is a user-friendly tool to convert your dicom study data to BIDS.

  • It is required, that you have a structured naming convention on your folders, that contain the dicoms.
  • You simply have to run the Rscript --vanilla start_BIDS_ConverteR.R /home/path/study_data again and again, until each setting and issue is resolved.
  • Step 1: setup of subject info (subject-regex, group-regex)
    • pattern_to_remove-regex - that removes redundant appendices from subject string
  • Step 2: setup of session info
    • your naming scheme for the session, if different from the foldername
  • Conversion (lazy, automated, if settings above are plausible)
    • nii + json (removal of sensitive information)
    • json (containing all information for internal quality control)
  • Step 3: setup of sequence info
    • identifies all sequence names
    • you have to name the sequences to BIDS standard
    • set type (anat/dwi/func)
    • decide on relevance (0:no or 1:yes)
      • the relevant sequences are copied to BIDS
  • Copy2BIDS
    • copies relevant sequences to BIDS
    • creates
      • automated output based on dicom headers: participants.tsv/.json, TaskName.json
      • template output of: study_description.json, README and CHANGES
        • edit these files, to fit to your study!
  • Check your dataset with BIDS-Validator

You have a user folder, where you see your settings on session, sequence and subject information (subject-id and group-id). Your bids/sourcedata folder already contains participants.tsv extracted from dicom header info and the other json files, needed for a valid BIDS dataset. You only need to customize these to your study/authors/grant/license.

User Interaction

  • Follow the statements of the script
  • edit the files. Please use LibreOffice for native support. Microsoft Excel interprets input data already instead on relying on the csv structure. So it has some problems in parsing the csv into spreadsheets and depends on local language and delimiter settings. There are settings on your system, that would enable the support of Excel on your computer. Click on the link
    • user/settings/lut_study_info.csv - you need an exact match of the subject and group regex
      • subject regex: "[:digit:]{5}" - translates to: 5 digits specify my subject id
      • group regex: "[:digit:]{1}(?=[:digit:]{4})" - translates to: the first digit, before 4 digits come (in a total of 5 digits)
      • pattern to remove simple: "_your_study|_your_stdy|_yr_study|_your_study|_my_Study" - translates to: remove each occurrence of a string (splitted by "|"")
      • pattern to remove advanced: "(your|yr|my)(study|stdy|Study)"
      • pattern to remove expert: "(?<=[:digit:]{5})*" - translates to:
    • user/settings/lut_sessions.csv - name your sessions
    • user/settings/lut_sequences.csv - name your sequences to BIDS. Please look into the BIDS specifications for further information on valid filenames.
  • customize the files (automated output) in the bids/sourcedata directory
    • dataset_description.json - contains general information on your study (authors, grants, acknowledgement, licenses, etc.)
    • README - contains general information on your study
    • CHANGES - changelog of your files
    • participants.tsv - automated extraction of parameters from the dicom header
    • participants.json - describing the variables of the participants.tsv
    • taskname_bold.jsons - multiple files, depending on functional scans

The next update is coming! We implemented a Dashboard, which shows information about the whole study, acquired data, detail information on sequences, header information and plausility checks (id, gender, scanner sequence settings, duplicate sequences.

This tool will be developed to become a package for R.

to do (needs implementation)

  • optional "anonymization.csv"
    • for changing subject ids - left(old name), right (new name)
  • anonymization using pydeface or fsl_deface for sharing anonymized (header + image) files.
  • consistent debugging information

The algorithm works as described below:

Requirements

  • a study folder as root directory
  • a folder named DICOM, with a folder for each session (your session IDs: eg. Baseline, FollowUp1, etc)
  • in these folders the subjects of that session, containing the DICOM folders
  • you have to think about your subject nomenclature - if you have a naming scheme (e.g. 5 digits, the first on coding the group), other naming conventions are set by regular expressions later

Container start

  • The container is started using the command "docker run - ..... " (coming). Here your study folder (containing the DICOM folder) is mounted into the container and Docker can write files to it.

Output folders and user interaction folders

  • folder creation:
    • BIDS/sourcedata - the output folder for your dataset in BIDS format
      • dont.txt - change to do.txt AFTER checking with the Dashboard and user diagnostics that your settings are working.
    • NII_temp - write out folder for the anonymized dcm2niix converted files and headers. Do NOT delete these files. Each session is a folder, containing all subjects folders. Take care, it is dependent on the right subject nomenclature in the user_settings/pattern_remove.txt). The files are not in BIDS format but converted to NIIGZ.
    • NII_headers - write out folder only for dicom headers (same structure as NII_temp), but these headers are NOT anonymized. It is used for plausibility checks (ID, gender, birthdate, weight, acquisition date).
    • Export_Cooperation
      • export_template file, change the information and rename the file to enable BIDS export for a cooperation partner. Export_output_BIDS folder is saved here.
    • user_information - write out folder for information files regarding the renaming and conversion procedures
    • user_diagnostics - write out folder for diagnostics
    • user_settings - write out folder for the files, that you have to edit manually in a spreadsheet. All these files will be checked for not assigned values and inconsistencies, so that the code inhibits the further processing steps. If you have e.g. subjects, that does not fit into your subject regex the code aborts - this is functionality to keep your output data clean and affects all "user_settings" files. Debugging messages will be implemented!
      • pattern_remove.txt - is created before the dcm2niix conversion runs. Script aborts here, if the information below is not overwritten.
        • subjects: [:digit:]{5} - regular expression indicating 5 digits for the subject name. Find out how your naming convention for the subjects is. If you have subjects-ids like AB01 you can set this regex: [:alpha:]{2}[:digit:]{2}. For other setups look into the "stringr Cheat Sheet", page 2 - hostet by RStudio.
        • group: regex, where the group id in the filename is. In my case I can extract the first digit from the 5 digit subject id using [:digit:]{1}(?=[:digit:]{4}) - translated to "extract the one digit, which is followed by 4 other digits. Please think about adding it to the filename, because further file selection is much easier.
        • remove: Here you can add regex or absolute tags, that you want to remove from the foldername, e.g. ",BiDirect" in my case to have a clearly structured subject id.
        • This file is checked every run, to identify the already processed output folders, but also to keep sure, that your nomenclature works.
      • BIDS_session_mapping.csv -
        • Here you give your sessions a renaming nomenclature if needed (Baseline = 1, FollowUp = 2, or something else).
        • This file is checked every run to identify new sessions.
      • BIDS_mapping.csv - This is the file, that needs the most work. You map each of the automatically identified sequences in your dataset to a BIDS Standard name (T1w, T2w, FLAIR, ect...). Do NOT add filename extensions (eg.".nii" or ".nii.gz"). They will be added automatically to add NII and JSON data to your BIDS dataset (requirement of BIDS). The right nomenclature also identifies the bids tags anat/dwi/func based on your input data. If the detection is misleading just contact me! Then you can label binary in the "relevant" column, which files are relevant (1) and not relevant (0) for you. This affects, which output you want to copy to the BIDS folder! Please check the diagnostics folder, if your mapping is correct. Here you can uncheck for instance Smartbrains or scanner-derived processings.
    • Dashboard contains the rendered Dashboard if enabled, based on the extracted JSON information. Change dont.txt to do.txt if you want to enable the Dashboard. Only possible after the editing the BIDS_mapping.csv.

You see, that you only have to ineract with 3 scripts! If something is implausible, the tool will give you in future the exact filename, where something is missing.

General information

  • I provided template files, that you have to edit manually. I think, that this makes the use more easy for you.
  • Everytime you start the Container all the above steps run. If you have new subjects added to the DICOM folder, you maybe need to edit the new information in the .csv files or the user_settings folder again. The older information from before is kept. If you delete the files, you need to set them up again, to get the process running.
  • The implemented stops are only conducted, when manual editing is needed and a debug message is shown. E.g. a new subject, session or sequence was identified.
  • We implemented lazy processing, so that already converted files or extracted information is NOT extracted twice to enable functionality from the beginning of a study to the end.
  • If something strange happens, delete every other folder than the DICOM folder and run the script again.

Known issues

  • Mainly based on misleading information/regex provided by the user on the BIDS standard
  • Philips does not provide Slice-Timing information
  • Issues, when you change LUT (look-up-table) information, e.g. "relevance" or "bids_sequence_id" of a sequence in an ongoing study.
  • It is everytime safe to delete the "bids" and "nii_temp" directory, and start the script again
  • If you delete the user-directory, all your manual settings are deleted!