/findkit

A Python library for content-based information retrieval

Primary LanguagePython

findkit

A Python library for content-based information retrieval.

Goal

Provide utilities for easy setup of CBIR systems.

Summary

Modern deep learning models can often be used for extracting features from different types of data - for example images and music.

On the other hands there are methods for similarity search based on k-nearest neighbors algorithms.

This library aims to provide unified interface for machine learning frameworks and nearest-neighbors indexing libraries, and to bridge the gap between them.

Main pipeline

What's implemented

  • VectorLoader

    • FunctionVectorLoader
      • Audio
        • STFTVectorLoader (uses librosa)
        • EssentiaVectorLoader (uses features extracted using essentia, planned)
    • Doc2VecLoader (planned)
  • FeatureExtractor

    • KerasFeatureExtractor
    • SklearnFeatureExtractor
    • MXNetFeatureExtractor (uses MXNet Module API, experimental)
    • GluonFeatureExtractor (planned)
  • Index

    • AnnoyIndex
    • NMSLibIndex (experimental)

Examples

Useful links