/openhouse

Open Control Plane for Tables in Data Lakehouse

Primary LanguageJavaBSD 2-Clause "Simplified" LicenseBSD-2-Clause

OpenHouse

Control Plane for Tables in Open Data Lakehouses

CI/CD Commit Activity Docs
GitHub Slack

OpenHouse is an open source control plane designed for efficient management of tables within open data lakehouse deployments. The control plane comprises a declarative catalog and a suite of data services. Users can seamlessly define Tables, their schemas, and associated metadata declaratively within the catalog. OpenHouse reconciles the observed state of Tables with the desired state by orchestrating various data services.

Getting Started

Prerequisites

For building and running locally in Docker Compose, you would need the following:

  • Java
    • OpenHouse is currently built with Java 8, and will be modernized soon.
    • Set the JAVA_HOME environment variable to the location of your JDK8.
  • Docker
  • Docker Compose
  • Python3

For deploying OpenHouse to Kubernetes, you would need the following:

Building OpenHouse

To build OpenHouse, you can use the following command:

./gradlew build

Running OpenHouse with Docker Compose

To run OpenHouse, we recommend the SETUP guide. You would bring up all the OpenHouse services, MySQL, Prometheus, Apache Spark and HDFS.

Deploying OpenHouse to Kubernetes

To deploy OpenHouse to Kubernetes, you can use the DEPLOY guide. You would build the container images for all the OpenHouse services, and deploy them to a Kubernetes cluster using Helm.

Compability Matrix

OpenHouse is built with the following versions of the open-source projects:

Project Version
Apache Iceberg 1.2.0
Apache Spark 3.1.2
Apache Livy 0.7.0-incubating
Apache Hadoop Client 2.10.0
Springboot Framework 2.6.6
OpenAPI 3.0.3

Contributing

We welcome contributions to OpenHouse. To get involved:

Please refer to the CONTRIBUTING guide for more details. To get started on the high-level architecture, please refer to the ARCHITECTURE guide.