TidyDicom is a simple program based on Python . It can categorize all DICOM files in the root directory into specific folders.
Welcome any issues and PR.
The Windows version based on embeddable Python is uploaded. It can run without python pre-installed.
The below modules must be installed first to make the TidyDicom work.
- pydicom=2.4.3
- tqdm=4.66.4
Suppose that we have a root folder full of DICOM files and we need to categorize these DICOM files into specific folders. For example, if we want the first layer of the directory tree to be the patient's name and patient's ID, the second layer to be the study date, and the second layer to be the series description. DICOM files with the same patient's name, patient's ID, study date, and series description will go to the same folder.
Open the custom.py
file:
Firstly, we need to tell the program which DICOM data element to read to get the informations.
INFO_DICT = {
'PatientName':{
'standard_name':'PatientName',
'default_value':'NoPatientNameTag'
},
'PatientID':{
'standard_name':'PatientID',
'default_value':'NoPatientIDTag'
},
'StudyDate':{
'group_id':'0008',
'element_id':'0020',
'default_value':'NoStudyDateTag'
},
'SeriesDescription':{
'group_id':'0008',
'element_id':'103E',
'default_value':'NoSeriesDescriptionTag'
}
}
We defined a Python dictionary in which the keyword is a string of anything and the value is used to get the tag information.
Standard DICOM data elements can be accessed by standard keyword like PatientName
for (0010,0010) Patient's Name
and PatientID
for (0010,0020) | Patient ID
. It can also be accessed by group number and element number like group_id=0008, element_id=0020
for (0008,0020) Study Date
and group_id=0008, element_id=103E
for (0008,103E) Series Description
.
Private DICOM data elements can only be accessed by group number and element number.
A list of keywords for all standard elements can be found here.
The program uses the INFO_DICT
and returns a Python dictionary info
containing the element informations.
Secondly, we need to tell the program what is the rule to categorize the DICOM files.
def GetPathByInfo(info:Dict[str, Any]) -> str:
pid = info['PatientID'] # no change
pname = info['PatientName'].components # PearsonName object, return tuple
pname = ''.join(pname).lower().replace(' ', '')
date = info['StudyDate'] # no change
series_description = info['SeriesDescription'] # no change
return os.path.join(id+'_'+pname, date, series_name) # using os.path.join is a safe method for all OS.
Suppose a DICOM file is named 1234.dcm
and the program read it and returns the info = {'PatientName':'Jackson', 'PatientID':'111111', 'StudyDate':20211202, 'SeriesDescription':'arterial'}
. The function above will return '111111_Jackson/20211202/arterial'
. Finally the file will be move to './111111_Jackson/20211202/arterial/1234.dcm'
.
Open the config.json
file:
we need to tell the program where is the root folder and where we want to move the DICOM files to.
{
"in_dir": "C:\\Users\\J\\Desktop\\mess",
"out_dir": "C:\\Users\\J\\Desktop\\clean",
"copy_or_cut": "copy",
"custom_file":"custom",
"batch_size":2000
}
The in_dir
is the root folder of your unsorted DICOM files. The out_dir
is the root folder of your sorted DICOM files. copy_or_cut
defines the method when moving the files, by either shutil.copy
or shutil.move
. custom_file
is the filename of the custom.py. You can write different custom files (e.g., custom_a
for custom_a.py
, custom_b
for custom_b.py
) to handle different DICOMs. batch_size
is the number of files processed in one batch.
Run the main.py
and the program will start to work.
The terminal will display the process bar batch by batch.
When any error occurs, a failures.csv
will be stored in out_dir
logging the filepath and error information.
- download
TidyDicom.zip
and unzip it. - modify
custom.py
andconfig.json
. - double click
run.bat
This project is licensed under the GPL 3.0 License