Feathub is a feature store that facilitates feature development and deployment to achieve the following objectives:
- Reduce duplication of data engineering efforts by allowing new ML projects to reuse and share a library of curated production-ready features already registered by existing projects in the same organization.
- Simplify feature management by allowing users to specify feature definitions and feature processing jobs as code using a declarative framework.
- Facilitate feature development-to-deployment iteration by allowing users to use the same declarative feature definitions across training and serving, online and offline, without training-serving skew. Feathub takes care of compiling feature definitions into efficient processing jobs and executing those jobs in a distributed cluster.
Feathub provides SDK and infra that enable the following capabilities:
- Define feature-view (a group of related features) as transformations and joins of the existing feature-views and data sources.
- Register and retrieve feature-views by names from feature registry.
- Transform and materialize features for the given time range and/or keys from the feature view into feature stores, by applying transformations on source dataset with point-in-time correctness.
- Fetch online features by joining features from online feature store with on-demand transformations.
The above figures show the Feathub architecture. Please checkout Feathub architecture for more details of these components.
Prerequisites for building python packages:
- Unix-like operating system (e.g. Linux, Mac OS X)
- Python 3.7
- Java 8
- Maven >= 3.1.1
Run the following command to install Feathub from source.
# Build Java dependencies for Feathub
$ cd java
$ mvn clean package
$ cd ..
# Install Feathub
$ python -m pip install ./python
Execute the following command to run the nyc_tax.py demo which demonstrates the capabilities described above.
$ python python/feathub/examples/nyc_taxi.py
If you are interested in computing the Feathub features with a local Flink cluster. You can follow the Flink Processor Quickstart.
- This tutorial provides more details on how to define, extract and serve features using Feathub.
- This document explains the Feathub expression language.
- This document introduces the Flink processor that computes the features with Flink.
$ python -m pip install -r python/dev-requirements.txt
$ pytest -W ignore::DeprecationWarning
Feathub uses Black to format Python code, flake8 to check Python code style, and mypy to check type annotation.
Run the following command to format codes, check code style, and check type annotation before uploading PRs for review.
# Format python code
$ python -m black python
# Check python code style
$ python -m flake8 --config=python/setup.cfg python
# Check python type annotation
$ python -m mypy --config-file python/setup.cfg python