TWITER-FLOW

This repo houses the Equity Sim Twitter analytics Java pipeline for GCP Dataflow

Usage

Setup mvn project with com.equitysim package

mvn archetype:generate \
      -DarchetypeGroupId=org.apache.beam \
      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
      -DarchetypeVersion=2.1.0 \
      -DgroupId=com.equitysim \
      -DartifactId=twitter_flow \
      -Dversion="0.1" \
      -Dpackage=com.equitysim \
      -DinteractiveMode=false

Build and run twitter_flow pipeline locally

mvn compile exec:java \
    -Dexec.mainClass=com.equitysim.TwitterFlowPipeline \
    -Dexec.args="--output=./output/"

Build and run twitter_flow pipeline on GCP Dataflow

mvn compile exec:java \
        -Dexec.mainClass=com.equitysim.TwitterFlowPipeline \
        -Dexec.args="\
        --project=${PROJECT_ID} \
        --stagingLocation=gs://${BUCKET_NAME}/staging \
        --runner=DataflowRunner \
        --output=gs://${BUCKET_NAME}/output "

Build and run twitter_flow pipeline on GCP Dataflow using script

./pipeline.sh chc-admin twitter-flow fintech-tweets run

gcloud beta dataflow jobs list --status=(active|terminated|all)

https://drive.google.com/file/d/1oFADb-ePYLYWIBxGC3eakWABBrTtZohv/view?usp=sharing