/beam-workshop

Getting Started with an Apache Beam Development Environment

Primary LanguagePythonApache License 2.0Apache-2.0

Getting Started with an Apache Beam Development Environment

Link to Slides

Install Cloud SDK

  • (Optional) Request a Test Google Account
  • Verify if already installed by typing gcloud in bash terminal
  • Follow the appropriate quickstart for your OS
  • Initialize tool
    • gcloud init
      • Create your own project when prompted
  • Set default application credentials
    • gcloud auth application-default login

Download IDE (Integrated Development Environment)

For Java Developers:

For Python Developers:

For Developers with only a browser:

Launch your IDE

On the welcome screen:

  • Select Checkout from Version Control
  • Choose Git from the dropdown

Enter git repo URL:

Running Beam in Cloud Dataflow

Set the following required pipeline arguments:

  • --runner=DataflowRunner
  • --project=YOUR_PROJECT_ID

More Dataflow Pipeline Options

Build and Run Beam SDK Examples

Launch IntelliJ and on the welcome screen:

Python

The following commands should be run in the sdks/python directory

  • virtualenv env
  • source env/bin/activate
  • pip install -e .[gcp]