/sparkling

Primary LanguagePythonApache License 2.0Apache-2.0

Sparkling: multimodal distributed auto clustering

What is it

The aim of this framework is to provide user-friendly interface to solve hard clustering problem.

Sparkling is based on Apache Spark, e.g., is applicable to process huge distributed datasets.

Framework implements its own data preprocessor and introduces three novel features:

  • Multimodal data support (any combination of tabular, image and text);
  • Automatic quality optimisation by means of reinforcement learning and bayes optimisation;
  • Measure (cluster validity index, CVI) recommendation based on meta-learning method.

Contents