/teams-league-java-standard-beam

This video present a real world use case developed with Apache Beam Java and launched with the serverless Dataflow runner in Google Cloud Platform. The job read a Json file from Cloud Storage, applies some transformations and write the result to a BigQuery table.

Primary LanguageJava

teams-league-java-standard-beam

This video present a real world use case developed with Apache Beam Java and launched with the serverless Dataflow runner in Google Cloud Platform.

The job read a Json file from Cloud Storage, applies some transformations and write the result to a BigQuery table.

The link to the video that explains this use case.

Run job with Dataflow runner :

Batch

mvn compile exec:java \
  -Dexec.mainClass=fr.groupbees.application.TeamLeagueApp \
  -Dexec.args=" \
  --project=gb-poc-373711 \
  --runner=DataflowRunner \
  --jobName=team-league-java-job-$(date +'%Y-%m-%d-%H-%M-%S') \
  --inputJsonFile=gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json \
  --region=europe-west1 \
  --streaming=false \
  --zone=europe-west1-d \
  --tempLocation=gs://mazlum_dev/dataflow/temp \
  --gcpTempLocation=gs://mazlum_dev/dataflow/temp \
  --stagingLocation=gs://mazlum_dev/dataflow/staging \
  --serviceAccount=sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com \
  --teamLeagueDataset=mazlum_test \
  --teamStatsTable=team_stat \
  --bqWriteMethod=FILE_LOADS \
  " \
  -Pdataflow-runner

Streaming

mvn compile exec:java \
  -Dexec.mainClass=fr.groupbees.application.TeamLeagueApp \
  -Dexec.args=" \
  --project=gb-poc-373711 \
  --runner=DataflowRunner \
  --jobName=team-league-java-job-$(date +'%Y-%m-%d-%H-%M-%S') \
  --inputJsonFile=gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json \
  --inputSubscription=projects/gb-poc-373711/subscriptions/team_league \
  --region=europe-west1 \
  --streaming=true \
  --zone=europe-west1-d \
  --tempLocation=gs://mazlum_dev/dataflow/temp \
  --gcpTempLocation=gs://mazlum_dev/dataflow/temp \
  --stagingLocation=gs://mazlum_dev/dataflow/staging \
  --serviceAccount=sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com \
  --teamLeagueDataset=mazlum_test \
  --teamStatsTable=team_stat \
  --bqWriteMethod=STREAMING_INSERTS \
  " \
  -Pdataflow-runner