/ILabel

Primary LanguageJava

ILabel

A school project which aims to detect text and images in a document in image format written in the Java programming language.

Technology used

  • imageJ : a framework for image processing writtent in Java. This project is in fact a plugin for the standalone imageJ, but can stand on its own
  • ijBlob : a small library which aims to detect connected components in an image

How it works

First of all, we're detecting accentuated characters to make further analysis easier. The, the algorithm is mainly using 3 heuristic in order to detect everything :

  • Binary Density Heuristic
  • Closure Effect Heuristic
  • Class Size Heuristic

More information on how it works in the following slides with results at the end (in french) : https://drive.google.com/open?id=0B6dZXfkEU79yT1NvMmNFTmViR0U

Can I use it in my project ?

As it stands, it's a big no. Indeed, our method is far from perfect, even if it's yielding correct results most of the time. We could've used better algorithms like the MSER algorithm, and then develop on this method. More information about the MSER algorithm : https://en.wikipedia.org/wiki/Maximally_stable_extremal_regions Moreover, there is far better library for this, like OpenCV.