
Tool for labeling images

Image Clustering and Segmenting: icas

Tool for labeling images


Tool for clustering mixed images. Further details about classes can be found here. Main image clustering pipeline flow operates as follows:

Process starts with folder full of mixed images:
whole dataset

1- For each batch:

  • image features are obtained with one of these five methods:

    • SSIM: images itself is used as feature
    • minhash: vector of most obvious corners pixel locations are used as feature. Corners are detected with cornerHarris from opencv
    • imagehash: perceptual hash of image is used as feature, phash is calculated with ImageHash library
    • ORB: images ORB features are used as feature. features are calculated with ORB class from opencv
    • TM: images itself is used as feature
  • similarities are calculated with selected methods similarity calculation:

    Two image with similarity score bigger than threshold is considered similar, threshold can be given as parameter or interactively selected from set of computations over small data sample. Y axis is threshold value and X axis is the number of expected similar pairs at corresponding threshold value. Approximate number of clustered image is calculated and displayed when a threshold value is hovered:
    (Interactive selection can be problematic on notebooks or different backend GUIs)

  • similar images are clustered with "If and X and Y image are similar, they are putted into same cluster. If any X and Y image has an image chain X-A-B-...-N-Y that has consecutive pair similarities, they are putted into same cluster." logic

  • all clusters + outliers are writed into batch's folder

A folder full of batch folders and computed image similarities are created after first step:
mid result Sample batch folder content(each cluster folder has similar found images inside it):
mid result

2- Merging batch folders

  • first image from all cluster folders at every batch folder is selected as a "representative" for that cluster
  • representative features are obtained
  • similarities between these representatives are calculated
  • similar representatives are clustered
  • cluster folders are merged according to their representatives belonging cluster

All batch folders are merged in one resutl folder after second step:
result folder Sample cluster folder content:
mid result

Deep Learning supported image clustering pipeline flow operates as follows:

1- For whole dataset:

  • If there ins't already a trained model deep learning model is trained with one of the following loss functions:
    • PyTorch MSELoss
    • PyTorch L1Loss
    • perceptual loss, which is obtained by passing both the deep learning models input and output to another feature extractor model(default is torchvision VGG19). Then calculating the mean of features difference.

2- For each batch:

3- Merging batch folders

  • first image from all cluster folder at every batch folder is selected as a "representative" for that cluster
  • representative features are obtained
  • selected clustering model are created according to given parameters
  • all models are evaluated and best model is selected
  • representatives clustered with best model
  • cluster folders are merged according to their representatives belonging cluster

In deep learning pipeline, main flow is preserved. Only the underlying structure for computations such as image feature extraction(done by feature extractor deep learning models) and similarity calculation(done by clustering models) are changed.

Main Flow(hard cornered item means a folder in computer, soft cornered item means a variable held storage during run time):

Computation workload and efficiency for main flow

$C$: number of classes
$O$: number of outliers
$S$: number of similar items

We can separate all calculations into two type of calculations, intra and inter class calculations. For each individual class, we can think it as two graphs: $SG$(similars_graph) containing similar items and $OG$(outliers_graphs) containing outlier items.

for each batch:
    for ith class in batch:
        # c_i is set of items in ith class
        # p_i is the probability of an item being in SG
        S_i = |c_i| * p_i # number of similar items
        O_i = |c_i| * p_i # number of outlier items
        L_i = Comb(S_i-1, 2) # number of links between similar items


Number of needed intra-class calculations for each class can be computed like this:

  • To cover links inside $SG$: $((S_{i})-1)$ calculations
  • To cover links inside $OG$: $\binom{O_i}{2}$ calculations
  • Links between $SG$ and $OG$: $(O_{i})$ calculations(will be covered in inter-class calculations)

So for each class $c_{i}$ in batch, total of:
$((S_{i})-1) + \frac{O_i(O_i - 1)}{2}$ calculations are enough to create $SG$ and $OG$ graphs


Now each $SG$ and $OG$ graph in same batch must also be compared between themself. For each class pair $(c_{i}, c_{j}$) number of needed inter-class calculations can be computed like this:

  • $((SG_{i}), (SG_{j}))$: $1$ calculation
  • $((OG_{i}), (OG_{j}))$: $\binom{O_i+O_j}{2}$ calculation
  • $((SG_{i}), (OG_{j}))$: $(O_{j})$ calculation
  • $((OG_{i}), (SG_{j}))$: $(O_{i})$ calculation

Inter-class calculations are calculated only once, so above four steps can be generalized to all classes as follows:

  • to cover all $(SG, SG)$ graph pairs: $\frac{C(C - 1)}{2}$ calculation
  • to cover all $(OG, OG)$ graph pairs: $\frac{O(O - 1)}{2}$ calculation
  • to cover all $(SG, OG)$ graph pairs: $O*C$ calculation


So for a batch, total of:
$\frac{C(C - 1)}{2} + \frac{O(O - 1)}{2} + O*C$ calculations


After doing all batch calculations we will end up with batch folders containing cluster and outlier folders. One item from every cluster folder is selected to handle the second phase with representative items. Representative items can be thinked as a class and needed computations can be calculated as:
$(S_{r}-1) + \frac{O_r(O_r - 1)}{2} + O_{r}$

Now if we combine all combinations in one equations, here is the total similarity computation workload of one full main clustering pipeline:

for batch in dataset:
  for ith class in batch:
    $(S_{i}-1) + \frac{O_i(O_i - 1)}{2}$
  $\frac{C(C - 1)}{2} + \frac{O(O - 1)}{2} + O*C$
$(S_{r}-1) + \frac{O_r(O_r - 1)}{2} + O_{r}$

To write the equations using known variables:

  • $B$: number of batches
  • $C$: number of classes in the dataset
  • $p_{i}$: expected probability of class item similarity
  • $P$: vector of expected similar items in each class: $[(|c_{1}|*p_{1}), (|c_{2}|*p_{2}), (|c_{3}|*p_{3})...]$
  • $T$: vector of expected outlier items in each class: $[(|c_{1}|-P_{1}), (|c_{2}|-P_{2}), (|c_{3}|-P_{3})...]$
  • $R$: expected number of representatives are $B*C$ when all classes are distributed equally to batches. Expected number of similars in representatives are $C$, outliers are $(B-1)*C$

for batch in dataset:
  for ith class in batch:
    $(P_{i}-1) + \frac{T_i(T_i - 1)}{2}$ --> calculates each classes' SG and OG
  $\frac{C(C - 1)}{2} + \frac{(\sum{T})((\sum{T}) - 1)}{2} + (\sum{T})*C$ --> merges SGs with OGs
$(C-1) + \frac{((B-1)*C)(((B-1)*C) - 1)}{2} + ((B-1)*C)$ --> merges representatives

Now since we all know the variables, we can compute the expected number of computations and how much more efficient is clustering than pairwise checking for a dataset by running below code with selected parameters(Keep in mind that below code generates the worst-case scenario for clustering and calculates approximate expected calculations. Clustering algorithm will be more efficient than these calculations with parallel threads and some additional optimizations.):

import random

N = 20000  # number of items in dataset
C = 5  # number of classes
b = 2500  # batch size
B = N // b  # number of batches
c = [f"c_{i}" for i in range(C)]  # expexted class labels
p = [0.5, 0.5, 0.5, 0.5, 0.5]  # expected similarity rating for classes
classes_and_ps = dict(map(lambda i,j : (i,j) , c,p))

dataset = [(str(i),random.choice(c)) for i in range(N)]

class_SG_OGs = []  # list to log number of computations for every class in each batch
merge_SG_OGs = []  # list to log number of computations end of every batch

for i in range(0,N, b):
    batch = dataset[i:i+b]
    Ps, Ts = [], []
    for c_id in c:
        class_items = [i for i in batch if i[1] == c_id]
        P_i = len(class_items) * classes_and_ps[c_id]
        T_i = len(class_items) - P_i


        class_SG_OG = int((P_i-1) + (T_i)*(T_i-1)/2)

    merge_SG_OG = (C*(C-1))/2 + sum(Ts)*(sum(Ts)-1)/2 + sum(Ts)*C

merge_representatives = (C-1) + (((B-1)*C)*((B-1)*C-1))/2 + ((B-1)*C)

total_pairs = N*(N-1)/2
expected_calculations = sum([sum(merge_SG_OGs), sum(merge_SG_OGs), merge_representatives])

print(f"Saved {100 - expected_calculations / total_pairs * 100} of all computations.")

Further possible optimizations:

  • openmp optimizations
  • C/C++ optimizations
  • CUDA optimizations


Tool for interactively segmentating images. Further details about classes can be found here. Main image segmenting pipeline flow operates as follows:

1- image is divided into segments with one of these methods. Segmented image will have labeled segments starting from 1(also edges with value of 0 if any):

  • edge: image is divided with edges using opencv's operations

  • superpixel: opencv's superpixel is used

  • kmeans: opencv's kmeans is used

  • slickmeans: first opencv's superpixel, than opencv's kmeans is applied

  • chanvase: scikit-image's chan vese is used

  • felzenszwalb: scikit-image's felzenszwalb is used

  • quickshift: scikit-image's quickshift is used

  • graph: opencv's graph segmentation is used

  • grabcut: opencv's grabcut is used. Segmentation is done manually on two window with five annotation types:

    • Segments window: displays the current segments of image

    • Annotations window: displays the current annotations on image

    • rectangle annotation: annotated with mouse middle button, indicates the attention area of the grabcut

    • foreground and background annotation: annotated with left and right click, indicates the pixels that are definitely foreground or background

    • possible foreground and background annotation: annotated with ctrl + left and right click, indicates the pixels that may be foreground or background

      Also keyboard inputs are listened for various actions other than painting:

    • q: quits the segmentation

    • f: finishes the image segmentation and passes image to interactive painting

    • r: resets the annotations

    • space: runs grabcut once(multiple presses are needed for convergence)
      Annotations of a sample grabcut:
      Annotations of a sample grabcut
      selected foreground:
      selected foreground

  • SAM: Meta's Segment Anything Model is used. Segmentation is done by one of two SAM models: SamAutomaticMaskGenerator(doesnt require any annotation, all processes are automatic) or SamPredictor(prompt must be generated on a window with three annotation types):

    • Annotations window: displays the current segments of image

    • rectangle annotation: annotated with mouse middle button, indicates the attention area

    • foreground and background annotation: annotated with left and right click, indicates the pixels that are definitely foreground or background

      Also keyboard inputs are listened for various actions other than painting:

    • q: quits the segmentation

    • r: resets the annotations

    • space: ends segmenting and passes prompt to prediction function

    • f: finishes the segmentation and passes image to interactive painting

    • z: reverses the last annotation
      Annotations of a sample SAM:
      Annotations of a sample SAM
      generated mask:
      generated mask

2- Two window is showed to user, one for color selecting other for painting segments.

  • Color selecting window is used for selecting the segmentation color and displaying the painting mode. There are two paint modes other than default clicking actions. One is for continuously filling and other is unfilling. Both of them are activated and deactivated with double click on related mouse button.
    Sample image "jet1.jpg":
    Sample image "jet1.jpg" Segments for "jet1.jpg" using superpixel(selected method and its parameters should be selected for better segments, this is only for explanatory purposes[black lines around red painted area are edge annotations, originally not included in segments]):
    Segments for "jet1.jpg"
    Painted image:
    Painted image
    Generated Mask "jet1_mask_(R:204,G:0,B:0).png":
    Generated Mask "jet1_mask_(R:204,G:0,B:0).png"

  • Painting are done in segmenting window. Left click fills the segment and right click unfills, Rapid filling and unfilling can be done with continuous modes. Middle button is used to make a cut, a line is cutted between consecutive middle button clicked points and cutted pixels are assigned to be an edge. Also keyboard inputs are listened for various actions other than painting:

    • q: quits the segmentation
    • n: goes to next image in folder(no save)
    • p: goes to previous image in folder(no save)
    • space: saves the current image masks with "original_image_name_mask_(R:value,G:value,B:value).png" format and goes to next image
    • z: reverses the last action
    • r: resets the segmentation
    • d: displays the image segmentation and painted pixels for debug purposes
    • t: applies template painting. Painting is done with four base image type template, attention(optional), segment and mask(optional). Attention and mask images can generated from template and segment images if not provided.
      • template: template to look for a match in image
      • attention: masks that indicates which parts of the templates are considered while looking for a match
      • segment: paint to put over found match
      • mask: indicates which pixels on the segment image will painted on the image
        Sample template(means we will search for a plane in this pose):
        Sample template
        Sample attention(means that we will ignore the sky and only focus on plane similarity):
        Sample attention
        Sample segment(means these pixels will be painted):
        Sample segment
        Sample mask(means only white pixels will be painted):
        Sample mask

To user attention:

Below libraries are not installed with icas since aim for icas is to be lightweight. Advanced usage such as Deep Learning Clustering and SAM segmentation requires below installations:

pip install torch
pip install torchvision
pip install scikit-learn
pip install git+https://github.com/facebookresearch/segment-anything.git