Serving Moirai with BentoML

Moirai, the Masked Encoder-based Universal Time Series Forecasting Transformer is a Large Time Series Model pre-trained on LOTSA data. This is a BentoML example project, demonstrating how to build a forecasting inference API for time-series data using Moirai-1.0-R-Large.

See here for a full list of BentoML example projects.

Install dependencies

git clone https://github.com/bentoml/BentoMoirai.git
cd BentoMoirai

# Recommend Python 3.11
pip install -r requirements.txt

Run the BentoML Service

We have defined a BentoML Service in service.py. Run bentoml serve in your project directory to start the Service.

$ bentoml serve .

2024-01-08T09:07:28+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "service:Moirai" can be accessed at http://localhost:3000/metrics.
2024-01-08T09:07:28+0000 [INFO] [cli] Starting production HTTP BentoServer from "service:Moirai" listening on http://localhost:3000 (Press CTRL+C to quit)
Model Moirai loaded device: cuda

The Service is accessible at http://localhost:3000. You can interact with it using the Swagger UI or in other different ways:

CURL

curl -s \
     -X POST \
     -F 'csv=@data.csv' \
     http://localhost:3000/forecast_csv

Python client

import bentoml
import pandas as pd

df = pd.read("data.csv")

with bentoml.SyncHTTPClient("http://localhost:3000") as client:
    result = client.forecast(df=df)

Deploy to BentoCloud

After the Service is ready, you can deploy the application to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.

Make sure you have logged in to BentoCloud, then run the following command to deploy it.

bentoml deploy .

Once the application is up and running on BentoCloud, you can access it via the exposed URL.