/finetuner

Finetuning any DNN for better embedding on neural search tasks

Primary LanguagePythonApache License 2.0Apache-2.0

Finetuner logo: Finetuner allows one to finetune any deep Neural Network for better embedding on search tasks. It accompanies Jina to deliver the last mile of performance-tuning for neural search applications.

Finetuning any deep neural network for better embedding on neural search tasks

Python 3.7 3.8 3.9 PyPI

Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks. It accompanies Jina to deliver the last mile of performance for domain-specific neural search applications.

🎛 Designed for finetuning: a human-in-the-loop deep learning tool for leveling up your pretrained models in domain-specific neural search applications.

🔱 Powerful yet intuitive: all you need is finetuner.fit() - a one-liner that unlocks rich features such as siamese/triplet network, interactive labeling, layer pruning, weights freezing, dimensionality reduction.

⚛️ Framework-agnostic: promise an identical API & user experience on PyTorch, Tensorflow/Keras or PaddlePaddle deep learning backends.

🧈 Jina integration: buttery smooth integration with Jina, reducing the cost of context-switch between experiment and production.

How does it work

Python 3.7 3.8 3.9

Install

Requires Python 3.7+ and one of PyTorch (>=1.9) or Tensorflow (>=2.5) or PaddlePaddle installed on Linux/MacOS.

pip install finetuner

Usage

Usage Do you have an embedding model?
Yes No
Do you have labeled data? Yes 🟠 🟡
No 🟢 🔵

🟠 Have embedding model and labeled data

Perfect! Now embed_model and labeled_data are given by you already, simply do:

import finetuner

finetuner.fit(
    embed_model,
    train_data=labeled_data
)

🟢 Have embedding model and unlabeled data

You have an embed_model to use, but no labeled data for finetuning this model. No worry, that's good enough already! You can use Finetuner to interactive label data and train embed_model as below:

import finetuner

finetuner.fit(
    embed_model,
    train_data=unlabeled_data,
    interactive=True
)

🟡 Have general model and labeled data

You have a general_model which does not output embeddings. Luckily you provide some labeled_data for training. No worries, Finetuner can convert your model into an embedding model and train it via:

import finetuner

finetuner.fit(
    general_model,
    train_data=labeled_data,
    to_embedding_model=True,
    output_dim=100
)

🔵 Have general model and unlabeled data

You have a general_model which is not for embeddings. Meanwhile, you don't have labeled data for training. But no worries, Finetuner can help you train an embedding model with interactive labeling on-the-fly:

import finetuner

finetuner.fit(
    general_model,
    train_data=unlabeled_data,
    interactive=True,
    to_embedding_model=True,
    output_dim=100
)

Finetuning ResNet50 on CelebA

⚡ To get the best experience, you will need a GPU-machine for this example. For CPU users, we provide finetuning a MLP on FashionMNIST and finetuning a Bi-LSTM on CovidQA that run out the box on low-profile machines. Check out more examples in our docs!

  1. Download CelebA-small dataset (7.7MB) and decompress it to './img_align_celeba'. Full dataset can be found here.
  2. Finetuner accepts Jina DocumentArray/DocumentArrayMemmap, so we load CelebA image into this format using a generator:
    from jina.types.document.generators import from_files
    
    def data_gen():
        for d in from_files('./img_align_celeba/*.jpg', size=100, to_dataturi=True):
            d.convert_image_datauri_to_blob(color_axis=0)  # `color_axis=-1` for TF/Keras users
            yield d
  3. Load pretrained ResNet50 using PyTorch/Keras/Paddle:
    • PyTorch
      import torchvision
      model = torchvision.models.resnet50(pretrained=True)
    • Keras
      import tensorflow as tf
      model = tf.keras.applications.resnet50.ResNet50(weights='imagenet')
    • Paddle
      import paddle
      model = paddle.vision.models.resnet50(pretrained=True)
  4. Start the Finetuner:
    import finetuner
    
    finetuner.fit(
        model=model,
        interactive=True,
        train_data=data_gen,
        freeze=True,
        to_embedding_model=True,
        input_size=(3, 224, 224),
        output_dim=100
    )
  5. After downloading the model and loading the data (takes ~20s depending on your network/CPU/GPU), your browser will open the Labeler UI as below. You can now label the relevance of celebrity faces via mouse/keyboard. The ResNet50 model will get finetuned and improved as you are labeling. If you are running this example on a CPU machine, it may take up to 20 seconds for each labeling round.

Finetuning ResNet50 on CelebA with interactive labeling

Support

Join Us

Finetuner is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in opensource.