0. ComputerVision

how computers can understand digital images or videos just like automate tasks that human vision system can do [blog]

  • Preprocessing : 이미지 크기 조정, 색상 보정, 노이즈 제거 등
  • Edge Extraction/Line Detection
  • Image/Video Segmentation : 이미지를 구성하는 픽셀들을 객체나 영역으로 분할
  • Object Detection/Tracking : 특정 객체를 식별하고 추적
  • Image Transformation

1. Background

1.1 History

  • 1960s : Perceptron 개념을 기반으로 Image Pattern Recognize 연구가 시작됨. 픽셀 수준의 처리와 패턴 인식에 초점이 맞춰짐 [survey]
  • 1970~1980s : edge detection, histogram equalization과 같은 이미지 처리 알고리즘의 이론적 토대와 기술적 발전이 이루어짐
  • 1990s : 복잡하고 정교한 알고리즘과 방법론이 도입됨. Object Detection/Tracking, Segmentation 기술이 등장함
  • 2000s : 딥러닝과 신경망 알고리즘의 등장으로 모델의 성능이 획기적으로 향상되며, 얼굴인식 및 객체 분류 등 다양한 응용 사례가 등장함 실시간 처리 능력을 강조한 Model-centric 개발이 우선되었으나, 최근에는 실시간성뿐만 아니라 학습 데이터 품질 향상을 통한 정확성 증진도 활발히 연구되고 있음. 또한, 기존의 CNN [video]뿐만 아니라, 전체 시퀀스의 정보를 한번에 처리할 수 있는 "Attention Mechanism"이 도입된 Transformers가 주목받고 있음.

1.2 Basics of Digital Image/Video

  • Pixel : smallest unit of Image (1~4 values)
image
  • Intensity Level : 각각의 pixel이 가질 수 있는 표현의 개수. 2의 지수승으로 존재
    • Normally, 256( $2^8$ )
    • Intensity가 클수록 정교하게 표현할 수 있음
image
  • Pixel Resolution : 해상도. image가 가진 픽셀의 개수

    • Total Number of bits to store a digital image = the number of rows(height) * the number of columns(width) * the number of bits for one pixel(intensity level)
    • 640x360이 기본 : VGA(1,1.5) - HD(2,2) - FHD(3,3) - QHD(4,4) - UHD(6,6)
  • FPS : the number of images(frames) of a video per second

    • the interval is normally 33ms (*30fps=1000ms=1s)
    • FPS가 클수록, interval이 낮을 수록 자연스러움

2. Intensity Transformation

image
Way to enhance the image, using following functions:

2.1 Mapping function

  • Negative Transformation : $input+output = max intensity$
  • Log Transformation : $output = c*log(input+1) $
    • enhance contrast of dark region
image
  • Gamma(power-law) Correction : $output = c*input^r$
    • gamma < 1 : enhance the contrast of dark region
    • gamma = 1 : identity
    • gamma > 1 : enhance the contrast of bright region
image
  • Piecewise-linear Transformation : more complex
    • Thresholding is also possible
image

2.2 Filtering

2.2.1 Spatial Filtering

Spatial filters : spatial masks, kernels, templates, windows..

  1. Define the kernel size (3,3), (5,5), ...
  2. Scan with masking
  • Average Filtering : replace the value of the pixel by the average of the intensity levels in the neighborhood
    • reduce random noises
    • blur image
  • Gausian Filtering : set weight to neighborhood
    • Discretiezed Gaussian Kernel
    • Floating-point Gaussian kernel
image
  • sharpening : highlight transitions in intensity

    • Second Derivative : f'' = f(x+1)+f(x-1)-2f(x) | Sum = Laplacian

      image
    • Unsharp Masking

      image
  • Median Filtering : replace the value with the median value of a mask (3x3 -> 5th largest)

    • remove noise without blurry but need more computation
    • effective at impulse(sale-and-pepper noise)

2.2.2 Histogram Equalizationi

Depending on the number of bins, contrast of an image adujsted.

  • Contranst : The difference in brig;htness or color that makes an object distinguishable

CDF(Cumulative Distribution Function) : calculate the probability of pixel value image

3. Color Image Enhancement

How to enhance color image

3.1 Color Model

  • RGB

    image
  • HSI : Hue(색상:0~360°), Saturation(Clear), Intensity(Brightness)

    image
  • YUV(YCbCr) : Y(Luma:Brightness), Chroma Blue(Blue - Y), Chroma Red(Red - Y)

    image
  • Grayscale image : lightness is the only parameter of a pixel that can vary
  • Achromatic Color : gray, black, white

3.2 Color Processing

Intsity transformation, histogram equalization, spatial filtering are applied on the intensity channel only!

  • Color Conversion : CV_BGR2HSV, CV_BGR2GRAY

It means, converting color space into HSI or YUV from RGB is useful.

  • Pseudo Coloring : gray image to color image
  • Color Slicing : Find the pixels in the range of the desired color in the Hue-channel. Set all the other pixels to 0 in the Saturation-channel (grayscale image)
  • White balancing : global adjustment of the intensities of the colors
    • Gray World Assumption : the average of all the colors is a neutral gray : $result = original*(128/average)$

4. Edge/Line Detection

Edge : a lot of intensity difference / Image smoothing for noise reduction should be performed(like, mediean/average filtering)

  • Sobel Mask

    image

    image
  • Canny

    • smooothing : remove noise
    • sobel to calculate gradient : angle and derviative
    • non maxima suppresion : choose only the maximun
    • double thresholding & connectivity analysis : determine whether is edge or not

Line

  • Hough Transform ?
    • obtain a binary edge image
    • specify subdivision in the plane
    • examine the ocunts of the accumulator cells for hight pixel concentrations

same as circle detection

5. Image Segmentation

Process of partitioning a digital image into multiple region.

1 back / forward ? 2 thresholding : how to defined proper threshoding (between background and object) is important

  • gobal : Basing, Ostant? : performance measurement : within-class variance / between-class variance
    • Basic : repeat T=(m1+m2)/2 until the change is small enough
    • Otsu : compute between-class variance which is maximized
  • local(adaptive) : set a threshold depending on the intensity distribution of adjacent pixel 3 GrabCut ?

6. Video Segmentation (Background Subtraction)

1 background

  • GMM : p(B|A) and p(A|B) => p(X|background) and p(Background|X) : the form of ML 2 How to subtract

Morphological Operation

  • Erosion -> Opening : breaks narrow isthmuses and eliminates small island and sharp peaks
  • Dilation -> Closing : fueses narrow breaks and long thin gulfs and eliminates small holes

9. Image Feature Matching

Image Feature : piece of information that is relevant for solving the computational task such as specific structures(points, edges, objects)

  • ORB : oFast detector + r-BRIEF descriptor

    • Fast : Determines the corner by having more than N consecutive pixels whose intestities are higher or lower

    image

    • BRIEF : A bit string descriptor of an image patch constructed from a set of binary intensity tests
  • Good feature = inexpensive and memory efficient

    • NNDR(Nearest neighbor distance ratio) $= (frac{distance to best match}{distance to second best match})$

CNN

image

  • Convolution
  • Relu
  • Pooling image

10. Detection Tracking

11. Object Detection using Deep Learning

YOLO

12. Projective Transformation

After transformation, parallel lines should still parallel
Similarity < Affine < Projective

  • Homo graphy(projective/perspective transormation)

13. Understanding of a Camera

14. Image Compression

Feature => SIFT, IRB, corners

  • extraction : position of features (FAST)
  • description : ready for feature matching (BRIEF) >:0 <=:1 -> binary string? NNDR : best match > second best match?

Detection & Tracking

  • Face Detecter : Harr-like feature traking=find good harr-like feature boosting : sequetially do it Stroing learneer : use samll amount of weak learner ->