Report ETL Pipeline is a Dagster pipeline to extract radiological reports (inside SR Modality instances) from a PACS (by using ADIT) and transfer them to RADIS for creating a full-text search index. The pipeline contains two jobs. collect_reports_job
collects all reports since the year 2012 (by using Dagster backfills) and also has a schedule to collect reports during the night from the previous day to send them to RADIS. revise_reports_job
has a schedule to collect reports from the day 7 days before to collect changed or afterwards added reports and send those to RADIS.
- Both development and production uses Docker Compose to setup the Dagster server
dagster_home_dev
resp.dagster_home_prod
folder in the workspace is mounted asDAGSTER_HOME
folder. Every data output by the pipelines is stored in those folders.- Copy
example.env
to.env.dev
or resp..env.prod
and edit the settings in there. - Artifacts are stored according to
ARTIFACTS_DIR
. IfARTIFACTS_DIR
is not set then the files are stored in theDAGSTER_HOME
folder understorage
. - A relative
ARTIFACTS_DIR
path is stored relative toDAGSTER_HOME
which isdagster_home_dev
folder in development anddagster_home_prod
folder in production. - Production uses Nginx for basic auth and SSL encryption.
- Generate a password file for basic authentication by using
htpasswd -c .htpasswd <username>
(needs apache2-utils to be installed). - Generate SSL certificate with
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout ssl.key -out ssl.crt
(nothing has to be filled out)
- Generate a password file for basic authentication by using
- Attach the virtual environment with
poetry shell
and then start the stack withinv compose-up
orinv compose-up --env prod
. - Forward port
3500
in development resp.3600
in production to Dagster UI in VS Code ports tab. - Alternatively (for testing purposes), run a single job from command line, e.g.
python ./scripts/materialize_assets.py -d ./artifacts/ 2023-01-01
.