/feast-spark-offline-store

This repo contains a plugin for feast to run an offline store on Spark

Primary LanguagePythonMIT LicenseMIT

Feast Spark Offline Store plugin

❗ Further development is discontinued. In collaboration with the Feast community this repo is now added to the main project. From v0.19 feast-dev/feast#2349

This repo contains a plugin for feast to run an offline store on Spark. It can be installed from pip and configured in the feature_store.yaml configuration file to interface with DataSources using Spark.

Note that this repository has not yet had a major release as it is still work in progress.

Contributing

We strongly encourage you to contribute to our repository. Find out more in our contribution guidelines

Requirements

Currently only supports Feast versions >=0.15.0.

Installation

pip install feast-spark-offline-store

Or to install from source:

git clone git@github.com:Adyen/feast-spark-offline-store.git
cd feast-spark-offline-store
pip install -e '.[dev]'

Usage

Install feast and feast_spark_offline_store and change the Feast configurations in feature_store.yaml to use feast_spark_offline_store.SparkOfflineStore:

project: example_feature_repo
registry: data/registry.db
provider: local
online_store:
    ...
offline_store:
    type: feast_spark_offline_store.spark.SparkOfflineStore
    spark_conf:
        spark.master: "local[*]"
        spark.ui.enabled: "false"
        spark.eventLog.enabled: "false"
        spark.sql.catalogImplementation: "hive"
        spark.sql.parser.quotedRegexColumnNames: "true"
        spark.sql.session.timeZone: "UTC"

Documentation

See Feast documentation on offline stores and adding custom offline stores.

License

MIT license. For more information, see the LICENSE file.