apache-beam
There are 261 repositories under apache-beam topic.
tensorflow/tfx
TFX is an end-to-end platform for deploying production ML pipelines
GoogleCloudPlatform/DataflowTemplates
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
nielsbasjes/yauaa
Yet Another UserAgent Analyzer
GoogleCloudPlatform/flink-on-k8s-operator
[DEPRECATED] Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
blockchain-etl/bitcoin-etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
google/weather-tools
Tools to make weather data accessible and useful.
spotify/flink-on-k8s-operator
Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.
google/tensorflow-recorder
TFRecorder makes it easy to create TensorFlow records (TFRecords) from Pandas DataFrames and CSVs files containing images or structured data.
google/fhir-data-pipes
A collection of tools for extracting FHIR resources and analytics services on top of that data.
ngrunwald/datasplash
Clojure API for a more dynamic Google Dataflow
mohaseeb/beam-nuggets
Collection of transforms for the Apache beam python SDK.
blockchain-etl/blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
tosun-si/asgarde
Asgarde allows simplifying error handling with Apache Beam Java, with less code, more concise and expressive code.
mercari/DataflowTemplate
Mercari Dataflow Template
Fematich/mlengine-boilerplate
Repository to quickly get you started with new Machine Learning projects on Google Cloud Platform. More info(slides):
yu-iskw/bigquery-to-datastore
Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow
xmlking/micro-apps
Microservices in Post-Kubernetes Era. A polyglot monorepo
luisbelloch/data_processing_course
Some class materials for a data processing course using PySpark
blockchain-etl/blockchain-etl-architecture
Blockchain ETL Architecture
doitintl/banias
Opinionated serverless event analytics pipeline
tosun-si/pasgarde
Asgarde allows simplifying error handling with Apache Beam Python, with less code, more concise and expressive code.
asaharland/beam-pipeline-examples
Apache Beam examples for running on Google Cloud Dataflow.
NucleusEngineering/hack-your-pipe
Efficient streaming data ingestion, transformation & activation
mozilla-services/foxsec-pipeline
Log analysis pipeline utilizing Apache Beam
mercari/DataflowTemplates
Convenient Dataflow pipelines for transforming data between cloud data sources
sayakpaul/count-tokens-hf-datasets
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.
google-parfait/dataset_grouper
Libraries for efficient and scalable group-structured dataset pipelines.
janaom/gcp-data-engineering-etl-with-composer-dataflow
This project leverages GCS, Composer, Dataflow, BigQuery, and Looker on Google Cloud Platform (GCP) to build a robust data engineering solution for processing, storing, and reporting daily transaction data in the online food delivery industry.
esakik/beam-mysql-connector
Apache Beam I/O connector designed for accessing MySQL databases. https://beam.apache.org/documentation/io/connectors/#other-io-connectors-for-apache-beam
google/consent-based-conversion-adjustments
Code to statistically up-weight conversion values of consenting customers to feed up to 100% of the factual conversion values back into Google Ads.
carted/processing-text-data
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
cassiobolba/Data-Engineering
Projects and studies regarding Data Engineering Area
datastacktv/apache-beam-explained
Source code for the YouTube video, Apache Beam Explained in 12 Minutes
delftdata/stateflow
Prototype which extracts stateful dataflows by analysing Python code.
datastacktv/apache-beam-batch-processing
Public source code for the Batch Processing with Apache Beam (Python) online course