This project gets the data from Twitter Streaming API and publishes the "Corona" related tweets to Google Cloud Pub/Sub. After the tweets are published the Apache Beam processes the data and saves the results to BigQuery.
cd my_streaming_pipeline/
conda create python=3.7 -p venv/ -y
conda activate venv/
pip install -e .
pip install -r requirements.txt
pip install "apache-beam[gcp]"
- Get the BEARER_TOKEN from Twitter application
- Create a GCP project
- Create a GCP pub/sub topic
export BEARER_TOKEN=""
mystream
- Create a GCP Subscriber
- Create a BigQuery Dataset, table
python src/mystream/subscriber.py