how computers can understand digital images or videos just like automate tasks that human vision system can do [blog]
- Preprocessing : 이미지 크기 조정, 색상 보정, 노이즈 제거 등
- Edge Extraction/Line Detection
- Image/Video Segmentation : 이미지를 구성하는 픽셀들을 객체나 영역으로 분할
- Object Detection/Tracking : 특정 객체를 식별하고 추적
- Image Transformation
- 1960s : Perceptron 개념을 기반으로 Image Pattern Recognize 연구가 시작됨. 픽셀 수준의 처리와 패턴 인식에 초점이 맞춰짐 [survey]
- 1970~1980s : edge detection, histogram equalization과 같은 이미지 처리 알고리즘의 이론적 토대와 기술적 발전이 이루어짐
- 1990s : 복잡하고 정교한 알고리즘과 방법론이 도입됨. Object Detection/Tracking, Segmentation 기술이 등장함
- 2000s : 딥러닝과 신경망 알고리즘의 등장으로 모델의 성능이 획기적으로 향상되며, 얼굴인식 및 객체 분류 등 다양한 응용 사례가 등장함 실시간 처리 능력을 강조한 Model-centric 개발이 우선되었으나, 최근에는 실시간성뿐만 아니라 학습 데이터 품질 향상을 통한 정확성 증진도 활발히 연구되고 있음. 또한, 기존의 CNN [video]뿐만 아니라, 전체 시퀀스의 정보를 한번에 처리할 수 있는 "Attention Mechanism"이 도입된 Transformers가 주목받고 있음.
- Pixel : smallest unit of Image (1~4 values)
- Intensity Level : 각각의 pixel이 가질 수 있는 표현의 개수. 2의 지수승으로 존재
- Normally, 256(
$2^8$ ) - Intensity가 클수록 정교하게 표현할 수 있음
- Normally, 256(
-
Pixel Resolution : 해상도. image가 가진 픽셀의 개수
- Total Number of bits to store a digital image = the number of rows(height) * the number of columns(width) * the number of bits for one pixel(intensity level)
- 640x360이 기본 : VGA(1,1.5) - HD(2,2) - FHD(3,3) - QHD(4,4) - UHD(6,6)
-
FPS : the number of images(frames) of a video per second
- the interval is normally 33ms (*30fps=1000ms=1s)
- FPS가 클수록, interval이 낮을 수록 자연스러움
Way to enhance the image, using following functions:
- Negative Transformation :
$input+output = max intensity$ - Log Transformation :
$output = c*log(input+1) $ - enhance contrast of dark region
- Gamma(power-law) Correction :
$output = c*input^r$ - gamma < 1 : enhance the contrast of dark region
- gamma = 1 : identity
- gamma > 1 : enhance the contrast of bright region
- Piecewise-linear Transformation : more complex
- Thresholding is also possible
Spatial filters : spatial masks, kernels, templates, windows..
- Define the kernel size (3,3), (5,5), ...
- Scan with masking
- Average Filtering : replace the value of the pixel by the average of the intensity levels in the neighborhood
- reduce random noises
- blur image
- Gausian Filtering : set weight to neighborhood
- Discretiezed Gaussian Kernel
- Floating-point Gaussian kernel
-
sharpening : highlight transitions in intensity
-
Median Filtering : replace the value with the median value of a mask (3x3 -> 5th largest)
- remove noise without blurry but need more computation
- effective at impulse(sale-and-pepper noise)
Depending on the number of bins, contrast of an image adujsted.
- Contranst : The difference in brig;htness or color that makes an object distinguishable
CDF(Cumulative Distribution Function) : calculate the probability of pixel value
How to enhance color image
-
RGB
-
HSI : Hue(색상:0~360°), Saturation(Clear), Intensity(Brightness)
-
YUV(YCbCr) : Y(Luma:Brightness), Chroma Blue(Blue - Y), Chroma Red(Red - Y)
- Grayscale image : lightness is the only parameter of a pixel that can vary
- Achromatic Color : gray, black, white
Intsity transformation, histogram equalization, spatial filtering are applied on the intensity channel only!
- Color Conversion : CV_BGR2HSV, CV_BGR2GRAY
It means, converting color space into HSI or YUV from RGB is useful.
- Pseudo Coloring : gray image to color image
- Color Slicing : Find the pixels in the range of the desired color in the Hue-channel. Set all the other pixels to 0 in the Saturation-channel (grayscale image)
- White balancing : global adjustment of the intensities of the colors
- Gray World Assumption : the average of all the colors is a neutral gray :
$result = original*(128/average)$
- Gray World Assumption : the average of all the colors is a neutral gray :
Edge : a lot of intensity difference / Image smoothing for noise reduction should be performed(like, mediean/average filtering)
-
Sobel Mask
-
Canny
- smooothing : remove noise
- sobel to calculate gradient : angle and derviative
- non maxima suppresion : choose only the maximun
- double thresholding & connectivity analysis : determine whether is edge or not
Line
- Hough Transform ?
- obtain a binary edge image
- specify subdivision in the plane
- examine the ocunts of the accumulator cells for hight pixel concentrations
same as circle detection
Process of partitioning a digital image into multiple region.
1 back / forward ? 2 thresholding : how to defined proper threshoding (between background and object) is important
- gobal : Basing, Ostant? : performance measurement : within-class variance / between-class variance
- Basic : repeat T=(m1+m2)/2 until the change is small enough
- Otsu : compute between-class variance which is maximized
- local(adaptive) : set a threshold depending on the intensity distribution of adjacent pixel 3 GrabCut ?
1 background
- GMM : p(B|A) and p(A|B) => p(X|background) and p(Background|X) : the form of ML 2 How to subtract
- Erosion -> Opening : breaks narrow isthmuses and eliminates small island and sharp peaks
- Dilation -> Closing : fueses narrow breaks and long thin gulfs and eliminates small holes
Image Feature : piece of information that is relevant for solving the computational task such as specific structures(points, edges, objects)
-
ORB : oFast detector + r-BRIEF descriptor
- Fast : Determines the corner by having more than N consecutive pixels whose intestities are higher or lower
- BRIEF : A bit string descriptor of an image patch constructed from a set of binary intensity tests
-
Good feature = inexpensive and memory efficient
- NNDR(Nearest neighbor distance ratio)
$= (frac{distance to best match}{distance to second best match})$
- NNDR(Nearest neighbor distance ratio)
YOLO
After transformation, parallel lines should still parallel
Similarity < Affine < Projective
- Homo graphy(projective/perspective transormation)
Feature => SIFT, IRB, corners
- extraction : position of features (FAST)
- description : ready for feature matching (BRIEF) >:0 <=:1 -> binary string? NNDR : best match > second best match?
Detection & Tracking
- Face Detecter : Harr-like feature traking=find good harr-like feature boosting : sequetially do it Stroing learneer : use samll amount of weak learner ->