Proof of Concept for ingestion of twitter streams into Google BigQuery via Google Dataflow
Project layout:
beam-jobs/
- Python Beam jobs for loading tweests into BigQuery and for generating a list of trending topics.ingest-twitter/
- scripts to ingest Tweet streamssample-data/
- some already collected tweetsterraform/
- Infrastructure setup
- Setup infra per
terraform/
folder. - Get twitter stream data with scripts in
ingest-twitter/
folder or use supplied sample data. - Copy twitter data to created Google Cloud Storage bucket.
- Install python dependencies
pip install -r requirements.txt
- In
beam-jobs/`` adjust parameters in
run-import-on-gcp.sh` and run.