/apache-beam-golang-udf

Run UDFs (User Defined Functions) on Apache Beam Golang SDK.

Primary LanguageGoMIT LicenseMIT

Apache Beam Golang UDF

Run UDFs (User Defined Functions) on Apache Beam Golang SDK.

Go Modules activation (if necessary):

export GO111MODULE=on

CSV parse example:

# Direct
go run examples/parse/csv/csv.go

# Direct with internet files
go run examples/parse/csv/csv.go --location="http://localhost:8081/"

# Direct with GCS files
go run examples/parse/csv/csv.go --location=gs://apache-beam-golang-udf

# Dataflow with GCS files
go run examples/parse/csv/csv.go --location=gs://apache-beam-golang-udf \
    --max_num_workers=1 \
    --num_workers=1 \
    --project=marcelo-henrique-neppel \
    --runner=dataflow \
    --staging_location=gs://apache-beam-golang-udf/bin \
    --temp_location=gs://apache-beam-golang-udf/temp \
    --worker_harness_container_image=apachebeam/go_sdk:latest \
    --worker_machine_type=n1-standard-1

# Flink with GCS files
./gradlew :runners:flink:1.9:job-server:runShadow # For using the embedded cluster

./gradlew :runners:flink:1.9:job-server:runShadow -PflinkMasterUrl=localhost:8081 # For using a separate cluster

go run examples/parse/csv/csv.go --location=gs://apache-beam-golang-udf \
    --endpoint=localhost:8099 \
    --runner=flink

JSON parse example:

# Direct
go run examples/parse/json/json.go

# Direct with internet files
go run examples/parse/json/json.go --location="http://localhost:8081/"

# Direct with GCS files
go run examples/parse/json/json.go --location=gs://apache-beam-golang-udf

# Dataflow with GCS files
go run examples/parse/json/json.go --location=gs://apache-beam-golang-udf \
    --max_num_workers=1 \
    --num_workers=1 \
    --project=marcelo-henrique-neppel \
    --runner=dataflow \
    --staging_location=gs://apache-beam-golang-udf/bin \
    --temp_location=gs://apache-beam-golang-udf/temp \
    --worker_harness_container_image=apachebeam/go_sdk:latest \
    --worker_machine_type=n1-standard-1

# Flink with GCS files
./gradlew :runners:flink:1.9:job-server:runShadow # For using the embedded cluster

./gradlew :runners:flink:1.9:job-server:runShadow -PflinkMasterUrl=localhost:8081 # For using a separate cluster

go run examples/parse/json/json.go --location=gs://apache-beam-golang-udf \
    --endpoint=localhost:8099 \
    --runner=flink

XML parse example:

# Direct
go run examples/parse/xml/xml.go

# Direct with internet files
go run examples/parse/xml/xml.go --location="http://localhost:8081/"

# Direct with GCS files
go run examples/parse/xml/xml.go --location=gs://apache-beam-golang-udf

# Dataflow with GCS files
go run examples/parse/xml/xml.go --location=gs://apache-beam-golang-udf \
    --max_num_workers=1 \
    --num_workers=1 \
    --project=marcelo-henrique-neppel \
    --runner=dataflow \
    --staging_location=gs://apache-beam-golang-udf/bin \
    --temp_location=gs://apache-beam-golang-udf/temp \
    --worker_harness_container_image=apachebeam/go_sdk:latest \
    --worker_machine_type=n1-standard-1

# Flink with GCS files
./gradlew :runners:flink:1.9:job-server:runShadow # For using the embedded cluster

./gradlew :runners:flink:1.9:job-server:runShadow -PflinkMasterUrl=localhost:8081 # For using a separate cluster

go run examples/parse/xml/xml.go --location=gs://apache-beam-golang-udf \
    --endpoint=localhost:8099 \
    --runner=flink

On examples using GCS files, please upload the example files to one of your buckets first.