Streaming pipeline

This project gets the data from Twitter Streaming API and publishes the "Corona" related tweets to Google Cloud Pub/Sub. After the tweets are published the Apache Beam processes the data and saves the results to BigQuery.

Installation

cd my_streaming_pipeline/
conda create python=3.7 -p venv/ -y
conda activate venv/
pip install -e .
pip install -r requirements.txt
pip install "apache-beam[gcp]"

Running the project

Publish the tweets to GCP Pub/Sub

Get the BEARER_TOKEN from Twitter application
Create a GCP project
Create a GCP pub/sub topic

export BEARER_TOKEN=""
mystream

Consume the tweets and save them to BigQuery

Create a GCP Subscriber
Create a BigQuery Dataset, table

python src/mystream/subscriber.py

SelinGungor/streaming_pipeline

Streaming pipeline

Installation

Running the project

Publish the tweets to GCP Pub/Sub

Consume the tweets and save them to BigQuery