Computer Vision Taxonomy

Classical machine learning

  1. Scale-Invariant Feature Transform (SIFT)
  2. Speeded Up Robust Features (SURF)
  3. Histogram of Oriented Gradients (HOG)
  4. Support Vector Machine (SVM)
  5. Random Forest (RF)
  6. Adaboost
  7. K-Nearest Neighbors (KNN)
  8. K-Means Clustering

While these techniques might be relevant for learning purposes, they are not used in the current state-of-the-art methods for image understanding. The following techniques are more relevant for the current state-of-the-art methods.

Deep learning

  1. Convolutional Neural Networks (CNN)
    • ResNet
    • EfficientNet
  2. Region-based CNN (R-CNN)
    • Fast R-CNN
    • Faster R-CNN
    • Mask R-CNN
  3. You Only Look Once (YOLO)
    • YOLOv1, ..., YOLOv8
  4. Transformer-based
    • Vision Transformer (ViT)
    • Data-efficient Vision Transformer (DeiT)
    • End-to-End Object Detection with Transformers (DETR)
    • Swin Transformer: Hierarchical Vision Transformer
    • Fully Transformer-based Object Detector (ViDT)
    • Point Cloud Transformer (PCT)
  5. Vision Permutator: MLP-Like Architecture
  6. GAN
  7. Diffusion
  8. Tracking

Feature extraction