stream-registry

Announcement: 12th April 2019

We wanted to let you know that there are going to be some exciting developments with the Stream Registry project in the very near future. Stream Registry is being adopted by many brands at Expedia Group as a critical component of its digital nervous system for key streams across Expedia Group. Therefore, HomeAway stream registry is finding a new home.

What is changing

We will be investing in the project by expanding the existing team with full-time resources in several locations across Expedia Group. Expect greatly increased project activity: contributors, commits, issues, features, releases
The repository will relocate to the ExpediaGroup open source GitHub org in its entirety, preserving the history and community

What isn't changing

The original vision of Stream Registry as a Stream Discovery and Stream Orchestration platform
The project will remain open source, and will be joined shortly by other supporting Expedia Group stream platform components
Licenses, conduct and contribution guidelines will remain unchanged
The value of your contributions - please keep them coming!

We expect the start of this journey to be a little bumpy, but please bear with us as we work towards the first release of the Expedia Group Stream Registry!

About

A Stream Registry is what its name implies: it is a registry of streams. As enterprises increasingly scale in size, the need to organize and develop around streams of data becomes paramount. Synchronous calls are attracted to the edge, and a variety of synchronous and asynchronous calls permeate the enterprise. The need for a declarative, central authority for discovery and orchestration of stream management emerges. This is what a stream registry provides. In much the same way that DNS provides a name translation service for an ip address, by way of analogy, a Stream Registry provides a “metadata service” for streams. By centralizing stream metadata, a stream translation service for producer and/or consumer stream coördinates becomes possible. This centralized, yet democratized, stream metadata function thus streamlines operational complexity via stream lifecycle management, stream discovery, stream availability and resiliency.

Why Stream Registry?

We believe that as the change to business requirements accelerate, time to market pressures increase, competitive measures grow, migrations to cloud and different platforms are required, and so on, systems will increasingly need to become more reactive and dynamic in nature.

The issue of state arises.

We see many systems adopting event-driven-architectures to facilitate the changing business needs in these high stakes environments. We hypothesize there is an emerging need for a centralized "stream metadata" service in the industry to help streamline the complexities and operations of deploying stream platforms that serve as a distributed federated nervous system in the enterprise.

What is Stream Registry?

Put simply, Stream Registry is a centralized service for stream metadata.

The stream registry can answer the following question:

Who owns the stream?
Who are the producers and consumers of the stream?
Management of stream replication across clusters and regions
Management of stream storage for permanent access
Management of stream triggers for legacy stream sources

Architecture

See the architecture/northstar documentation for more details.

Building locally

Stream Registry is built using OpenJDK 11 and Maven. For convenience, we have wrapped each Maven command in a Makefile. If you do not have make installed, please consult this file for build commands.

Stream Registry is currently packaged as a shaded JAR file. We leave specific deployment considerations up to each team since this varies from enterprise to enterprise. We, do, however provide a vanilla Docker example for teams to use/leverage for demo, learning, or development purposes.

To build Stream Registry as a JAR file, please run

make build

To build Stream Registry as a Docker image, please run the following, which will use the Jib Maven Plugin to build and install the image

make build-docker

Start Stream Registry

Required Local Environment
The local 'dev' version of Stream Registry requires a locally running version of Apache Kafka and Confluent's Schema Registry on ports 9092 and 8081, respectively.

To quickly get a local dev environment set up, we recommend to use the provided Docker Compose. Be sure to first build the Docker image using the command above.

docker-compose up

Alternatively, one can start Confluent Platform locally after downloading the Confluent CLI and running the following command. Note: The confluent command is currently only available for macOS and Linux. If using Windows, you'll need to use Docker, or run ZooKeeper, Kafka, and the Schema Registry all individually.

confluent start zookeeper
confluent start kafka
confluent start schema-registry

Stream Registry can then be started

make run

Once Stream Registry has started, check that the application's Swagger API is running at http://localhost:8080/swagger

Create a Stream Locally

First create your cluster

curl -X PUT --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
     "clusterKey": {
       "vpc": "localRegion",
       "env": "local",
       "hint": "primary",
       "type": null
     },
     "clusterValue": {
       "clusterName": "localCluster",
       "bootstrapServers": "localhost:9092",
       "zookeeperQuorum": "zookeeper:2181",
       "schemaRegistryURL": "http://localhost:8081"
     }
   }' 'http://localhost:8080/v0/clusters'

Now, declare your stream

Here is a sample stream

curl -X PUT --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "name": "sampleStream",
  "schemaCompatibility": "BACKWARD",
  "latestKeySchema": {
    "id": "string",
    "version": 0,
    "schemaString": "\"string\"",
    "created": "string",
    "updated": "string"
  },
  "latestValueSchema": {
    "id": "string",
    "version": 0,
    "schemaString": "\"string\"",
    "created": "string",
    "updated": "string"
  },
  "owner": "string",
  "created": 0,
  "updated": 0,
  "githubUrl": "http://github.com",
  "isDataNeededAtRest": true,
  "isAutomationNeeded": true,
  "tags": {
    "productId": 0,
    "portfolioId": 0,
    "brand": "string",
    "assetProtectionLevel": "string",
    "componentId": "string",
    "hint": "primary"
  },
  "vpcList": [
    "localRegion"
  ],
  "replicatedVpcList": [
  ],
  "topicConfig": {},
  "partitions": 1,
  "replicationFactor": 1
}' 'http://localhost:8080/v0/streams/sampleStream'

Kafka Version Compatibility

Stream Registry development and initial deployment started with Kafka 0.11.0 / Confluent Platform 3.3.0, and has also been deployed against Kafka 1.1.1 / Confluent Platform 4.1.2.
As per the Kafka Compatibility Matrix, we expect Stream Registry to be compatbile with Kafka 0.10.0 and newer, and the internal Java Kafka clients used by Stream Registry can be found in the pom.xml.