Apache Beam / Google Dataflow pipeline for streaming messages from Pubsub queue to MarkLogic database
This is an Apache Beam streaming pipeline for content ingestion into a MarkLogic database. The pipeline consists of three steps:
-
Read messages from a Pubsub event queue into a PCollection.
-
Envelope the JSON payload and add provenance information.
-
Upload the transformed JSON to a MarkLogic database.
The cloud deployment of this pipeline runs an instance of StreamToMarkLogic