/carl-hauser

Open Source testing framework for image correlation, distance and analysis

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Carl-Hauser Project

Open Source Testing Framework for image correlation, distance and analysis. Strongly related to : Douglas-Quaid

Problem statement (@CIRCL)

A lot of information collected or processed by CIRCL are related to images (at large from photos, screenshots of website or screenshots of sandboxes). The datasets become larger and analysts need to classify, search and correlate throught all the images.

Target

Building a generic library and services which can establish correlation between pictures. In order to achieve this goals, experiments needs to be conducted. This is the goal of this repository.

Getting Started

  • Review of existing algorithms, techniques and libraries for calculating distances between images, State Of The Art : MarkDown | PDF version

Questions

Prerequisites

See requirements.txt

(...)

Installing

(...)

Running the tests

(...)

Running the benchmark evaluation

in /lib_testing you just have to launch "python3 ./launcher.py" Parameters are hardcoded in the launcher.py, as :

  • Path to pictures folder
  • Output folder to store results
  • Requested outputs (result graphe, statistics, LaTeX export, threshold evaluation, similarity matrix ...)

This is currently working on most configuration and will explore following algorithms for matching :

  • ImageHash Algorithms (A-hash, P-hash, D-hash, W-hash ... )
  • TLSH (be sure to have BMP pictures or uncompressed format at least. A function is available to convert pictures in /utility/manual.py)
  • ORB (and its parameters space)
  • ORB Bag-Of-Words / Bag-Of-Features (and its parameters space, including size of the "Bag"/Dictionnary)
  • ORB RANSAC (with/without homography matrix filtering)

You can also manually generate modified datasets from your original dataset :

  • Text detector and hider (DeepLearning, Tesseract, ...)
  • Edge detector (DeepLearning, Canny, ...)
  • PNG/BMP versions of pictures (compressed/uncompressed)

For Developers

(...)

Deployment

(...)

For the algorithms test library : See installation instruction

Built With & Sources

Contributing

PR are welcomed. New issues are welcomed if you have ideas or usecase that could improve the project.