- C++ shared object (.so) with Neon SIMD for Python is runnable on Unix (Ventura 13.3) and Linux (Ubuntu Linux 22.04.02) System. Super fast using -O3
- C++ .so with Pybind11 for Python
The best template matching implementation on the Internet.
Using C++/MFC/OpenCV to build a Normalized Cross Corelation-based image alignment algorithm
The result means the similarity of two images, and the formular is as followed:
-
rotation invariant, and rotation precision is as high as possible
-
using image pyrimid as a searching strategy to speed up 4~128 times the original NCC method (depending on template size), minimizing the inspection area on the top level of image pyrimid
-
optimizing rotation time comsuming from OpenCV by setting needed "size" and modifying rotation matrix
-
SIMD version of image convolution (especially useful for large templates)
4.1 update Neon SIMD on MacOS version .so, super fast
-
optimizing the function GetNextMaxLoc () with struct s_BlockMax, for special cases whose template sizes are extremely smaller than source sizes, and for large TargetNumber.
It gets quite far.
Test case: Src10 (3648 X 3648) and Dst10 (54 X 54)
Effect: time consuming reduces from 534 ms to 100 ms. speed up 434%
Inspection Image : 4024 X 3036
Template Image: 762 X 521
Library | Index | Score | Angle | PosX | PosY | Execution Time |
---|---|---|---|---|---|---|
My Tool | 0 | 1 | 0.046 | 1725.857 | 1045.433 | 76ms 🎖️ |
My Tool | 1 | 0.998 | -119.979 | 2662.869 | 1537.446 | |
My Tool | 2 | 0.991 | 120.150 | 1768.936 | 2098.494 | |
Cognex | 0 | 1 | 0.030 | 1725.960 | 1045.470 | 125ms |
Cognex | 1 | 0.989 | -119.960 | 2663.750 | 1538.040 | |
Cognex | 2 | 0.983 | 120.090 | 1769.250 | 2099.410 | |
Aisys | 0 | 1 | 0 | 1726.000 | 1045.500 | 202ms |
Aisys | 1 | 0.990 | -119.935 | 2663.630 | 1539.060 | |
Aisys | 2 | 0.979 | 120.000 | 1769.63 | 2099.780 |
note: if you want to get a best performance, please make sure you are using release verson (both this project and OpenCV dll). That's because O2-related settings significantly affects efficiency, and the difference of Debug and Release can up to 7 times for some cases.
test0 - with user interface
test1 (164ms 80ms (SIMD version), TargetNum=5, Overlap=0.8, Score=0.8, Tolerance Angle=180)
test2 (237 ms, 175ms (SIMD Version))
test3 (152 ms, 100ms (SIMD Version))
test4 (21 ms, Target Number=38, Score=0.8, Tolerance Angle=0, Min Reduced Area=256)
test5 (27 ms)
test6 (1157ms, 657ms (SIMD Version), Target Number=15, Score=0.8, Tolerance Angle=180, Min Reduced Area=256)
test7 (18ms, TargetNum=100, Score=0.5, Tolerance Angle=0, MaxOverlap=0.5, Min Reduced Area=1024)
- Download Visual Studio 2017 or newer versions
- Check on the option of "x86 and x64 version of C++ MFC"
- Install
- Open MatchTool.vcxproj
- Upgrade if it is required
- Open this project's property page
- Modified "General-Output Directory" to the .exe directory you want (usually the directory where your opencv_worldXX.dll locates)
- Choose the SDK version you have in "General-Windows SDK Version"
- Choose the right toolset you have in "General-Platform Toolset" (for me, it is Visual Studio 2017 (v141))
- Go to "VC++ Directories", and type in "Include Directories" for your own OpenCV (e.g. C:\OpenCV3.1\opencv\build\include or C:\OpenCV4.0\opencv\build\include)
- Type in "Library Directories" for your own OpenCV's library path (the directory where your opencv_worldXX.lib locates)
- Go to "Linker-Input", and type in library name (e.g. opencv_world310d_vs2017.lib or opencv_world401d.lib)
- Make sure that your opencv_worldXX.dll and MatchTool.Lang are in the same directory as .exe of this project
1.Select Debug_4.X or Release_4.X in "Solution Configuration"
2.Do step 10~12 in previous section
- Select the Language you want
- Drag Source Image to the Left Area
- Drag Dst Image to the Right Top Area
- Push "Execute Button"
- Target Number: possible max objects you want to find in the inspection image
- Max OverLap Ratio: (the overlap area between two findings) / area of golden sample
- Score (Similarity): accepted similarity of findings (0~1), lower score causes more execution time
- Tolerance Angle: possible rotation of targets in the inspection image (180 means search range is from -180~180), higher angle causes more execution time or you can push "↓" button to select 2 angle range
- Min Reduced Area: the min area of toppest level in image pyrimid (trainning stage)
- results are sorted by score (decreasing order)
- Angles: inspected rotation of findings
- PosX, PosY: pixel position of findings
contact information: dennisliu1993@gmail.com
- C++ shared library (.so) for python (Unix-ARM64, Ubuntu 22.04.02-ARM64)
- C++/MFC dll for .Net framework (Windows)
- pure C++ dll for Python (Windows)
- pybind11 .so
- Template Matching using Fast Normalized Cross Correlation
- computers_and_electrical_engineering_an_accelerating_cpu_based_correlation-based_image_alignment
If you encounter an error(exception) on the constructor of opencv class "RotatedRect", modify the content in types.cpp
:
this might due to Windows updates
RotatedRect::RotatedRect(const Point2f& _point1, const Point2f& _point2, const Point2f& _point3)
{
Point2f _center = 0.5f * (_point1 + _point3);
Vec2f vecs[2];
vecs[0] = Vec2f(_point1 - _point2);
vecs[1] = Vec2f(_point2 - _point3);
double x = std::max(norm(_point1), std::max(norm(_point2), norm(_point3)));
double a = std::min(norm(vecs[0]), norm(vecs[1]));
// check that given sides are perpendicular
// this is the line you need to modify
CV_Assert( std::fabs(vecs[0].ddot(vecs[1])) * a <= FLT_EPSILON * 9 * x * (norm(vecs[0]) * norm(vecs[1])) );
// wd_i stores which vector (0,1) or (1,2) will make the width
// One of them will definitely have slope within -1 to 1
int wd_i = 0;
if( std::fabs(vecs[1][1]) < std::fabs(vecs[1][0]) ) wd_i = 1;
int ht_i = (wd_i + 1) % 2;
float _angle = std::atan(vecs[wd_i][1] / vecs[wd_i][0]) * 180.0f / (float) CV_PI;
float _width = (float) norm(vecs[wd_i]);
float _height = (float) norm(vecs[ht_i]);
center = _center;
size = Size2f(_width, _height);
angle = _angle;
}
modify threshold value of CV_Assert line to a bigger one
then recompile the source code