SHARK2-implementatation

Implementation of Shark2 Algorithm

There are five steps:

Sampling

Pruning

Shape Score

Location Score

Compute Integration score and return top-n words

Sampling

The user template and the word templates can be simply viewed as vectors. In order to meaninfully compare two vectors they have be of equal size. We have chosen 100 to be the size of our vectors. So we need to sample 100 equidistant points in the user template and the model template. The approach is the following:

Compute the path length template length by calculating pairwise points length on the template. In order to get 100 equidistant points I choose number of points on each segment in proportion to how much it would contribute to the overall path length. We apply the same algorithm for sampling equidistant points for both user and word template. While in practice this algorithm doesn't guarantee perfectly equidistant points it's quite effective and efficient. Pruning

In order to reduce the number of potential candidates for matching, we retain only those words which lie close match to the starting and end character of the user template. The threshold is set at 15 units. Shape Score

Summation of average euclidean distance between the points in the user template and the word template. Location Score

Implemented as given in the paper. Return Top N Words

Have set the weights as 0.9 for shape score and 0.1 for location score. Imposing more penalty for deviations in shape score. Sort the list by integration scores and return the lowest one. Since it is very unlikely that the scores are going to overlap returning the words which lie in vicinity to the closest match too. The threshold was set at 5.