KInference

KInference is a library that makes it possible to execute complex ML models (written via ONNX) in Kotlin.

ONNX is a popular ecosystem for building, training, evaluating, and exchanging ML and DL models. It makes the process much simpler and divides the model into building blocks that can be switched or tuned to one's liking.

However, popular ML libraries, including those intended for the inference of ONNX models, carry with themselves a lot of dependencies and requirements that complicate their use in some cases. KInference is designed to facilitate the inference of ONNX models on a variety of platforms via configurable backends. Our library addresses not only the problem of server side inference, but also of local inference as well, and provides several solutions that are suitable for running both on user side and server side.

Right now, KInference is in active development.

Why should I use KInference?

KInference is specifically optimized for inference. Most of the existing ML libraries are, in fact, versatile tools for model training and inference, but carry with themselves a lot of dependencies and requirements. KInference, on the other hand, addresses inference-only functionality to help facilitate model inference with a relatively small yet convenient API and inference-specific optimizations.
KInference has pure-JS and pure-JVM backends that make it possible to run models anywhere where JS or JVM virtual machine is available. In addition, you can switch between the chosen backends using backend configuration.
KInference supports configurable backends. KInference employs platform-specific optimizations and allows backend configuration essential for multiplatform projects. You can choose a backend for every module in the build.gradle.kts project file just by adding corresponding dependencies, while keeping most of your KInference-related code in a single common module.
KInference enables data preprocessing. We understand that data needs preprocessing before feeding it to the model and that is why we implemented numpy-like n-dimensional arrays. KInference can also work with custom array formats, with some of them being available out-of-the-box (see multik, kmath).

KInference backends

KInference Core

Pure Kotlin implementation that requires nothing but vanilla Kotlin. KInference Core is lightweight but fast, and supports numerous ONNX operators. It makes the library easy to use and especially convenient for various applications that require the models to run locally on users' machines. Note that this backend is well-optimized for JVM projects only, and, despite the fact that KInference Core is available for JavaScript projects, it is highly recommended to use KInference TensorFlow.js backend instead for more performance.

KInference Core dependency coordinates:

dependencies {
    api("io.kinference", "inference-core", "0.2.26")
}

TensorFlow.js

High-performance JavaScript backend that relies on the Tensorflow.js library. Essentially, it employs GPU operations provided by TensorFlow.js to boost the computations. In addition, this implementation enables model execution directly in the user's browser. This backend is recommended for JavaScript projects.

TensorFlow.js backend dependency coordinates:

dependencies {
    api("io.kinference", "inference-tfjs", "0.2.26")
}

ONNXRuntime CPU and ONNXRuntime GPU

Java backends that use ONNXRuntime as an inference engine and provide common KInference API to interact with the ONNXRuntime library.

Note that the GPU backend is CUDA-only. To check on the system requirements, visit the following link

ONNXRuntime CPU backend dependency coordinates:

dependencies {
    api("io.kinference", "inference-ort", "0.2.26")
}

ONNXRuntime GPU backend dependency coordinates:

dependencies {
    api("io.kinference", "inference-ort-gpu", "0.2.26")
}

Third-party math libraries adapters

KInference works with custom array formats, and some of them are available out-of-the-box. Basically, adapters enable working with familiar array formats and libraries. You can use several third-party Kotlin math libraries with KInference via our data adapters. In addition to the library adapters listed below, you can implement your own adapters using KInference adapters API.

KMath adapter

Array adapter for the kmath library that works with JVM KInference backends.

Dependency coordinates:

dependencies {
    api("io.kinference", "adapter-kmath-{backend_name}", "0.2.26")
}

Multik adapter

Array adapter for the multik library that works with JVM KInference backends.

Dependency coordinates:

dependencies {
    api("io.kinference", "adapter-multik-{backend_name}", "0.2.26")
}

Getting started

Let us now walk through how to get started with KInference. The latest version of KInference is 0.2.26

Setup dependencies repository

Firstly, you should add KInference repository in build.gradle.kts via:

repositories {
    maven {
        url = uri("https://packages.jetbrains.team/maven/p/ki/maven")
    }
    
    maven {
        url = uri("https://packages.jetbrains.team/maven/p/grazi/grazie-platform-public")
    }
}

Project setup

To enable the backend, you can add the chosen KInference runtime as a dependency:

dependencies {
    api("io.kinference", "inference-core", "0.2.26")
}

Multi-backend project setup

To configure individual KInference backend for each target, you should add corresponding backends to the dependencies.

kotlin {
    jvm {}

    js(IR) {
        browser()
    }

    sourceSets {
        val commonMain by getting {
            dependencies {
                api("io.kinference:inference-api:0.2.26")
                api("io.kinference:ndarray-api:0.2.26")
            }
        }

        val jvmMain by getting {
            dependencies {
                api("io.kinference:inference-core:0.2.26")
            }
        }

        val jsMain by getting {
            dependencies {
                api("io.kinference:inference-tfjs:0.2.26")
            }
        }
    }
}

Examples

The examples module contains examples of solving classification tasks (cats vs dogs) and text generation. Different backends are used in the examples. Models for the examples were selected from the ONNX Model Zoo. Running the examples does not require converting models to different opsets. However, if you need to run a model with operator versions not supported by KInference, you can refer to Convert guide.

Want to know more?

KInference API itself is widely documented, so you can explore its code and interfaces to get to know KInference better.

You may also submit feedback and ask questions in repository issues and issue discussions.

JetBrains-Research/kinference