A simple setup to demo batch and streaming workloads using PyIceberg
This repo is geared towards GCP workloads
We are using a Docker image locally for postgres as a JDBC catalog, but this could be easily substituted with Cloud SQL on GCP, an Iceberg REST catalog etc. - anything thats supported by Iceberg
This is a Apache Beam/Cloud Dataflow pipeline ingesting data from a pub-sub topic and writing data to GCS in the Iceberg spec
A simple python notebook