HTTPArchive/data-pipeline

Eliminate the need to run batch pipelines

rviscomi opened this issue · 0 comments

Create a way for us to always use streaming inputs to Dataflow. When we need to reprocess an old crawl, we should be able to kick off a script that reads all HARs and generates corresponding Pub/Sub messages that can be read by the streaming pipeline.