Kedro Tutorial: Image Classification

This Kedro project demonstrates how to use Kedro for image classification tasks. For this tutorial, you'll be using the Ships in Satellite Imagery dataset from Kaggle. The dataset contains 4000 images in total, with 1000 images containing ships (positive class) and 3000 images not containing ships or partially containing ships (negative class).

Setup

  1. Clone this project locally
  2. Install dependencies
$ uv sync -p 3.11 --extra dev
  1. Download the dataset from Kaggle and place it in the data/01_raw directory

Setup Minio (optional)

  1. Spin up MinIO:
$ docker compose up -d 
  1. Create a data bucket:
$ mc alias set myminio http://127.0.0.1:9010 minioadmin minioadmin
$ mc mb myminio/data
  1. Add to local/credentials.yml
minio_credentials:
    key: minioadmin
    secret: minioadmin
    client_kwargs:
        endpoint_url: http://127.0.0.1:9010
  1. Update catalog.yaml datasets:
accuracy_plot:
  type: plotly.JSONDataset
  filepath: s3://data/08_reporting/accuracy_plot.json
  credentials: minio_credentials

loss_plot:
  type: plotly.JSONDataset
  filepath: s3://data/08_reporting/loss_plot.json
  credentials: minio_credentials

Part 1: Introduction to the project and Kedro concepts

For the first part of the tutorial, we'll be going through the raw data science code and see how to refactor it to make it more modular and reusable. We'll also start using Kedro as a library to explore and process the data with the help of DataCatalog and OmegaConfigLoader components.

This project contains three notebooks:

  • notebook_raw.ipynb: This notebook contains the raw unstructured code for the image classification task
  • notebook_refactor.ipynb: This notebook contains refactored code which uses methods and contains some degree of separation of configuration.
  • notebook_kedro.ipynb: This notebook introduces Kedro's DataCatalog and the OmegaConfigLoader to perform the loading and processing of the data.

Part 2: Introduction to Kedro framework

In this part, we'll be introduced to the Kedro project structure, how to structure the image classification code from notebooks to pipelines.

The following documentation pages will be helpful to follow along:

Part 3: Experiment tracking with Kedro and Mlflow

In this part, you will learn how Kedro can be used to track experiments with the help of MLflow. We'll be using the kedro-mlflow plugin to help log metrics, parameters, and artifacts to Mlflow. Here's a tutorial on the Kedro docs to add Mlflow to your Kedro workflow.

Part 4: Deploying a Kedro project on TBC