This app processes DICOM files for deidentification. The process includes (1) copying target DICOM files, (2) deidentifying the copied DICOM files, (3) zipping the deidentified DICOM files at a particular directory level, and (4) uploading the zip files to an Amazon S3 bucket.
There are classes in subpackages of edu.umich.med.alzheimers.dicom
for performing each of the four steps in isolation:
...copy.Copy
...deidentify.Deidentify
...zip.Zip
...upload.Upload
There is also one class for performing all four steps (copy, deidentify, zip, upload): edu.umich.med.alzheimers.dicom.CopyDeidentifyZipUpload
.
Whether you use the single- or multi-step process, the DICOMs are procssed the same:
- Copy: The source directory tree containing DICOM files is filtered for specific directories (by name and number of files within) and DICOM files (by name and
SeriesDescription
attribute value). That source directory tree is then copied to a target directory. - Deidentify: The DICOM files in the target directory tree are deidentified.
- Zip: The directories containing DICOM files (or DICOM files themselves) in the target directory tree are zipped at a specific tree depth. The resulting zip files are written to another directory.
- Upload: The zip files are uploaded to an S3 bucket.
To minimize file I/O, source directories and files are only copied if they do not already exist in the target directory tree.
A good grasp of regular expressions is required to configure and use this app effectively. (RegexOne offers a good tutorial on regular expressions.)
-
If you haven't already, install Java JDK 1.8 and ensure it's the default JDK.
-
In order to recompile/repackage the JARs or to run the tests, install sbt. You can install sbt from here.
-
Navigate to a directory that will be the parent directory of this app.
-
Clone this repository with HTTPS or SSH. HTTPS will probably be much easier.
- HTTPS Clone
git clone https://git.umms.med.umich.edu/michiganadc/dicom-deidentify.git
- SSH Clone
git clone git@git.umms.med.umich.edu:michiganadc/dicom-deidentify.git
-
cd
into the newly created directory:cd ./dicom-deidentify
-
Using sbt, make sure the app compiles:
sbt compile
Run sbt test
from the command line:
sbt test
Note: This will return failed tests until the app is properly configured.
There are five config files, one for the app overall (package.conf
) and four for each step in the deidentification process: copy.conf
, deidentify.conf
, zip.conf
, and upload.conf
. All five are placed in src/main/resources/config/
. (Templates *.conf.template
have been provided).
appDirPathStr
: String of the directory where this app rests, likely thedicom-deidentify
folder that was created when you cloned this repo.intermedDirsRegexArray
: Regex string array of directories between thesourceDirPathStr
incopy.conf
(see below) and the target DICOM files.dicomFilenameRegexArray
: Regex string array of DICOM file names to include for processing.seriesDescriptionRegexArray
: Regex string array of Series Description element values to include for processing.idPrefixesToReplaceArray
: Incorrect ID prefix values in DICOM "Patient ID" element to replace withcorrectIdPrefixStr
.correctIdPrefixStr
: Correct ID prefix to replace incorrect ID prefix values.
sourceDirPathStr
: Path string of the source directory that contains all the DICOM files you want to copy.targetDirPathStr
: Path string of the target directory that the DICOM files and their directory tree will be copied to.intermedDirsRegexArray
: Array of regular expression strings for directory names that will be included in the source and target directory trees.dicomFilenameRegexArray
: Array of regular expression strings for acceptable DICOM filenames to copy.seriesDescriptionRegexArray
: Array of regular expression strings for acceptable DICOM fileSeriesDescription
attribute values to copy.
sourceDirPathStr
: Path string of the target directory that the DICOM files and their directory tree will be copied to.dicomAttributesToReplaceWithZero
: Array of strings for DICOM attributes whose values will be set to empty strings.
sourceDirPathStr
: Path string of the source directory that contains the directories or DICOM files that will be zipped.targetDirPathStr
: Path string of the target directory that the zipped directories or zipped DICOM files will be placed in. Note that the directory tree above any zipped directory or DICOM file is preserved.zipDepth
: Depth in directory tree to zip directories containing DICOM files, or DICOM files themselves.
sourceDirPathStr
: Path string of the source directory that the zipped directories or zipped DICOM files have been placed in.uploadDepth
: Depth in directory tree where the zip files to be uploaded are.awsAccessKeyId
: AWS access key ID.awsSecretAccessKey
: AWS secret access key.s3BucketStr
: S3 bucket string, e.g., "my-s3-bucket" in "s3://my-s3-bucket".s3KeyPrefixStr
: S3 key prefix, .e.g, "folder1/folder2/" in "s3://my-s3-bucket/folder1/folder2/zipfile.zip".
Now that you've got the app configured, package it into a JAR file using sbt:
sbt package
This will build a JAR file, target/scala-2.13/dicom-deidentify_2.13-0.1.jar
.
To run the sbt package of the app JAR from the command line, you will need to define the path of some dependency JARs in your shell. Then you can pass the dependency JAR paths to the Java classpath option (as in java -cp
).
First, define the path of the dependency JARs:
SCALA="lib/scala-library-2.13.3.jar" && \
PIXELMED="lib/pixelmed.jar" && \
LB_CORE="lib/logback-core-1.2.3.jar" && \
LB_CLASSIC="lib/logback-classic-1.2.3.jar" && \
SLF4J="lib/slf4j-api-1.7.30.jar" && \
PICOCLI="lib/picocli-4.2.0.jar" && \
CONFIG="lib/config-1.3.0.jar" && \
ZIP="lib/zt-zip-1.14.jar" && \
S3="lib/aws-java-sdk-1.7.4.jar" && \
CL="lib/commons-logging-1.1.1.jar" && \
CC="lib/commons-codec-1.3.jar" && \
HTTP_CORE="lib/httpcore-4.2.jar" && \
HTTP_CLIENT="lib/httpclient-4.2.jar" && \
JAX_CORE="lib/jackson-core-2.1.1.jar" && \
JAX_DATA="lib/jackson-databind-2.1.1.jar" && \
JAX_ANNOT="lib/jackson-annotations-2.1.1.jar" && \
APP="target/scala-2.13/dicom-deidenitfy_2.13-0.1.jar"
Run the sbt-packaged JAR, passing both the dependency JARS to the classpath option, and the class that contains the main method for copying only: Copy
.
java -cp \
$SCALA:$PIXELMED:$LB_CORE:$LB_CLASSIC:$SLF4J:$PICOCLI:$CONFIG:$APP \
edu.umich.med.alzheimers.dicom.copy.Copy
Note that the Copy
class is in the subpackage copy
of the edu.umich.med.alzheimers.dicom
package.
Same as Copy above, but pass the class that contains the main method for deidentifying only: Deidentify
.
java -cp \
$SCALA:$PIXELMED:$LB_CORE:$LB_CLASSIC:$SLF4J:$PICOCLI:$CONFIG:$APP \
edu.umich.med.alzheimers.dicom.deidentify.Deidentify
Same as Copy above, but pass the class that contains the main method for zipping only: Zip
.
java -cp \
$SCALA:$PIXELMED:$LB_CORE:$LB_CLASSIC:$SLF4J:$PICOCLI:$CONFIG:$ZIP:$APP \
edu.umich.med.alzheimers.dicom.zip.Zip
Same as Zip above, but pass the class that contains the main method for uploading only: Upload
.
java -cp \
$SCALA:$PIXELMED:$LB_CORE:$LB_CLASSIC:$SLF4J:$PICOCLI:$CONFIG:$S3:$CC:$CL:$HTTP_CLIENT:$HTTP_CORE:$JAX_CORE:$JAX_DATA:$JAX_ANNOT:$APP \
edu.umich.med.alzheimers.dicom.upload.Upload
Run the sbt-packaged JAR (defined at APP="target/scala-2.13/dicom-deidentify_2.13-0.1.jar
above), passing both the dependency JARs to the classpath option, and the class that contains the main method for doing all three steps: CopyDeidentifyZip
.
java -cp \
$SCALA:$PIXELMED:$LB_CORE:$LB_CLASSIC:$SLF4J:$PICOCLI:$CONFIG:$ZIP:\
$S3:$CC:$CL:$HTTP_CLIENT:$HTTP_CORE:$JAX_CORE:$JAX_DATA:$JAX_ANNOT:$APP \
edu.umich.med.alzheimers.dicom.CopyDeidentifyZip
@LDNicolasMay - Idea and work.
Thanks to Dr. David A. Clunie for his PixelMed Java DICOM Toolkit.
Thanks also to Saravanan Subramainian for his DICOM Tutorials.