/aporia-importer

🏋️‍♀️ Import inference data from Amazon S3, Azure Blob Storage, Google Cloud Storage and others to Aporia

Primary LanguagePythonMIT LicenseMIT

🏋️‍♀️ Aporia Importer

Version License

A small utility to import ML production data from your cloud storage provider and monitor it using Aporia's monitoring platform.

Installation

pip install "aporia-importer[all]"

If you only wish to install the dependencies for a specific cloud provider, you can use

pip install "aporia-importer[s3]"

Usage

aporia-importer /path/to/config.yaml

aporia-importer requires a config file as a parameter, see configuration

Configuration

aporia-importer uses a YAML configuration file. There are sample configurations in the examples directory.

Currently, the configuration requires defining a model version schema manually - the schema is a mapping of field names to field types (see here). You can find more details in our docs.

The following table describes all of the configuration fields in detail:

Field Required Description
source True The path to the files you wish to upload, e.g. s3://my-bucket/my_file.csv. Glob patterns are supported.
format True The format of the files you wish to upload, see here
token True Your Aporia authentication token
environment True The environment in which Aporia will be initialized (e.g production, staging)
model_id True The ID of the model that the data is associated with
model_version.name True A name for the model version to create
model_version.type True The type of the model (regression, binary, multiclass)
predictions True A mapping of prediction fields to their field types
features True A mapping of feature fields to their field types
raw_inputs False A mapping of raw inputs fields to their field types
aporia_host False Aporia server URL. Defaults to app.aporia.com
aporia_port False Aporia server port. Defaults to 443

Supported Data Sources

  • Local files
  • S3

Supported Data Formats

  • csv
  • parquet

How does it work?

aporia-importer uses dask to load data from various cloud providers, and the Aporia sdk to report the data to Aporia.