Hive is not included in current Feast roadmap, this project intends to add Hive support for Offline Store.
For more details, can check this Feast issue.
Important: This project is still being developed and not ready for using yet, please let me know if any of you need it, I will probably give it more priority.
pip install feast
Install the latest dev version by pip:
pip install git+https://github.com/baineng/feast-hive.git
or by clone the repo:
git clone https://github.com/baineng/feast-hive.git
cd feast-hive
python setup.py install
feast init feature_repo
cd feature_repo
set offline_store
type to be feast_hive.HiveOfflineStore
project: ...
registry: ...
provider: local
offline_store:
type: feast_hive.HiveOfflineStore
host: localhost
port: 10000 # default
... # other parameters
online_store:
...
# This is an example feature definition file
from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, ValueType
from feast_hive import HiveSource
# Read data from Hive table
# Need make sure the table_ref exists and have data before continue.
driver_hourly_stats = HiveSource(
table_ref='example.driver_stats',
event_timestamp_column="datetime",
created_timestamp_column="created",
)
# Define an entity for the driver.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)
# Define FeatureView
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=Duration(seconds=86400 * 1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
],
online=True,
input=driver_hourly_stats,
tags={},
)
feast apply
The rest are as same as Feast Quickstart
git clone https://github.com/baineng/feast-hive.git
cd feast-hive
# creating virtual env ...
pip install -e .[dev]
# before commit
make format
makr lint
pip install -e .[test]
pytest --hive_host=localhost --hive_port=10000