/kedro-neptune

Primary LanguagePythonApache License 2.0Apache-2.0

Kedro-Neptune plugin

Main docs page for Kedro-Neptune plugin

See this example in Neptune Kedro pipeline metadata in custom dashboard in the Neptune UI

What will you get with this integration?

Kedro is a popular open-source project that helps standardize ML workflows. It gives you a clean and powerful pipeline abstraction where you put all your ML code logic.

Kedro-Neptune plugin lets you have all the benefits of a nicely organized kedro pipeline with a powerful user interface built for ML metadata management that lets you:

  • browse, filter, and sort your model training runs
  • compare nodes and pipelines on metrics, visual node outputs, and more
  • display all pipeline metadata including learning curves for metrics, plots, and images, rich media like video and audio or interactive visualizations from Plotly, Altair, or Bokeh
  • and do whatever else you would expect from a modern ML metadata store

Installation

Before you start, make sure that:

Install neptune-client, kedro, and kedro-neptune

Depending on your operating system open a terminal or CMD and run this command. All required libraries are available via pip and conda:

pip install neptune-client kedro kedro-neptune

For more, see installing neptune-client.

Quickstart

​See code examples on GitHub

​See runs logged to Neptune

This quickstart will show you how to:

  • Connect Neptune to your Kedro project
  • Log pipeline and dataset metadata to Neptune
  • Add explicit metadata logging to a node in your pipeline
  • Explore logged metadata in the Neptune UI.

Before you start

Step 1: Create a Kedro project from "pandas-iris" starter

kedro new --starter=pandas-iris
  • Follow instructions and choose a name for your Kedro project. For example, "Great-Kedro-Project"
  • Go to your new Kedro project directory

If everything was set up correctly you should see the following directory structure:

Great-Kedro-Project # Parent directory of the template
├── conf            # Project configuration files
├── data            # Local project data (not committed to version control)
├── docs            # Project documentation
├── logs            # Project output logs (not committed to version control)
├── notebooks       # Project related Jupyter notebooks (can be used for experimental code before moving the code to src)
├── README.md       # Project README
├── setup.cfg       # Configuration options for `pytest` when doing `kedro test` and for the `isort` utility when doing `kedro lint`
├── src             # Project source code
    ├── pipelines   
        ├── data_science
            ├── nodes.py
            ├── pipelines.py
            └── ...

You will use nodes.py and pipelines.py files in this quickstart.

Step 2: Initialize kedro-neptune plugin

  • Go to your Kedro project directory and run
kedro neptune init

The command line will ask for your Neptune API token

  • Input your Neptune API token:
    • Press enter if it was set to the NEPTUNE_API_TOKEN environment variable
    • Pass a different environment variable to which you set your Neptune API token. For example MY_SPECIAL_NEPTUNE_TOKEN_VARIABLE
    • Pass your Neptune API token as a string

The command line will ask for your Neptune project name

  • Input your Neptune project name:
    • Press enter if it was set to the NEPTUNE_PROJECT environment variable
    • Pass a different environment variable to which you set your Neptune project name. For example MY_SPECIAL_NEPTUNE_PROJECT_VARIABLE
    • Pass your project name as a string in a format WORKSPACE/PROJECT

If everything was set up correctly you should:

  • see the message: "kedro-neptune plugin successfully configured"
  • see three new files in your kedro project:
    • Credentials file:YOUR_KEDRO_PROJECT/conf/local/credentials_neptune.yml
    • Config file:YOUR_KEDRO_PROJECT/conf/base/neptune.yml
    • Catalog file:YOUR_KEDRO_PROJECT/conf/base/neptune_catalog.yml

You can always go to those files and change the initial configuration.

Step 3: Add Neptune logging to a Kedro node

  • Go to a pipeline node src/KEDRO_PROJECT/pipelines/data_science/nodes.py
  • Import Neptune client toward the top of the nodes.py
import neptune.new as neptune
  • Add neptune_run argument of type neptune.run.Handler to the report_accuracy function
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.run.Handler) -> None:
...

You can treat neptune_run like a normal Neptune Run and log any ML metadata to it.

Important
You have to use a special string "neptune_run" to use the Neptune Run handler in Kedro pipelines.

  • Log metrics like accuracy to neptune_run
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.run.Handler) -> None:
    target = np.argmax(test_y.to_numpy(), axis=1)
    accuracy = np.sum(predictions == target) / target.shape[0]
    
    neptune_run['nodes/report/accuracy'] = accuracy * 100

You can log metadata from any node to any Neptune namespace you want.

  • Log images like a confusion matrix to neptune_run
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame, 
                    neptune_run: neptune.run.Handler) -> None:
    target = np.argmax(test_y.to_numpy(), axis=1)
    accuracy = np.sum(predictions == target) / target.shape[0]
    
    fig, ax = plt.subplots()
    plot_confusion_matrix(target, predictions, ax=ax)
    neptune_run['nodes/report/confusion_matrix'].upload(fig)

Note
You can log metrics, text, images, video, interactive visualizations, and more.
See a full list of What you can log and display in Neptune.

Step 4: Add Neptune Run handler to the Kedro pipeline

  • Go to a pipeline definition, src/KEDRO_PROJECT/pipelines/data_science/pipelines.py
  • Add neptune_run Run handler as an input to the report node
node(
    report_accuracy,
    ["example_predictions", "example_test_y", "neptune_run"],
    None,
    name="report"),

Step 5: Run Kedro pipeline

Go to your console and execute your Kedro pipeline

kedro run

A link to the Neptune Run associated with the Kedro pipeline execution will be printed to the console.

Step 6: Explore results in the Neptune UI

  • Click on the Neptune Run link in your console or use an example link

https://app.neptune.ai/common/kedro-integration/e/KED-632

Default Kedro namespace in Neptune UI

  • See pipeline and node parameters in kedro/catalog/parameters

Pipeline parameters logged from Kedro to Neptune UI

  • See execution parameters in kedro/run_params

Execution parameters logged from Kedro to Neptune UI

  • See metadata about the datasets in kedro/catalog/datasets/example_iris_data

Dataset metadata logged from Kedro to Neptune UI

  • See the metrics (accuracy) you logged explicitly in the kedro/nodes/report/accuracy

Metrics logged from Kedro to Neptune UI

  • See charts (confusion matrix) you logged explicitly in the kedro/nodes/report/confusion_matrix

Confusion matrix logged from Kedro to Neptune UI

See also