MLFlow Automatic Wind Power Generation Forecasting System

Repository to store progress on Assignment 3 of course Large Scale Data Analysis, which explores using the MLFlow framework for organised, reproducible and maintainable machine learning experiments.

Running this Project

MLFlow projects generally offer two entry points into running projects. The first option is to simply clone this repository and then resolve the dependencies as specified in the conda.yml manually. The project can then be run by simply running the script.

git clone
conda env create -f conda.yml
conda activate wind-power-forecast

Instead of resolving the dependencies manually, it is also possible to run the project pipeline as specified in the MLProject environment file. Running

mlflow run .

from the root directory of this project first resolves the environment and then runs the entire project pipeline.

If you don't wish to clone the entire project, mlflow run can be run remotely through SSH GitHub using the following command:

mlflow run

Viewing Results

All experiment results are saved in an Azure Machine Learning environment. After cloning, the projects can however easily be run locally and the results are saved to the filesystem in the directory mlruns. Running mlflow ui from the root of the project opens a MLFlow tracking server with a user interface to easily view and compare the results of the different experiments.

mlflow ui 

Serving Model

To serve the model as a REST API run the following command after having run the entire project workflow. This will serve the currently best performing model (metadata stored in best_model.json and MLFlow artifact in directory best_model). By default the model is hosted locally at the address

mlflow models serve -m best_model [--no-conda]

Get predictions from the model by running

curl -H 'Content-Type: application/json'\
  -d '{"columns": ["Speed", "Direction"], "data": [[5,"S"]]}'

Testing the Model

The model is currently hosted on an Azure virtual machine as a background progress on port 5000. The model can be queried live through the following command:

curl -H 'Content-Type: application/json' -d '{"columns": ["Speed", "Direction"], "data": [[5,"S"]]}'