Process bounded and unbounded data and write to PubSub and BigQuery, currently using Google Dataflow (Apache Beam) and supports processing of Google Analytics hits and AWS Kinesis events
A generic pipeline that serializes json to dynamic protobuf message based on schemas in cloud storage. Support for bigquery data type as field option in protobuf schemas.
Query BigQuery backup tables and publish to pubsub for backfill purposes
Added support for IP-filters Modified schema for time dimensions to bigquery types
Changed license to AGPL 3.0 or later Added version 2 of pipeline for measurement protocol with more strictly typed schema.
Replaced the field localTimestamp of type STRING with a field localDateTime with type DATETIME for easier analysis in BigQuery.
Reads pubsub messages with request headers stored as pubsub attributes instead of concatenating request querystring body, synced changes with datahem.collector version 0.9.0. Configure jobs with json (account -> properties -> views) to process multiple properties and views in the same job. Cleaned up code and updated documentation. See README.md in the measurementprotocol.src.main.java.org.datahem.processor.measurementprotocol folder
Beam pipeline tests for all measurement protocol entities (16 tests). New entity capturing experiments from both Google Optimize and Content Experiments
Fixed URL decoding of site search term.
Changed the field naming to camel case instead of snake and fixed custom dimension and metrics suffixes to support multiple dimensions and metrics in bigquery.