Java web application that enables two or more text documents to be compared for similarity.
We live in Technological Era where data is gather very quickly, forming so called "Big Data", collection of most things that happens over internet. Big data engineering is important study in computing that involves forming data into meaningful information, ability to determine habits of internet users to predict their next step, searching query or suggest something from area of interest of particular user. Another large aspect of working with data is document comparison to detect duplication by matching letters, words and sub-sentences. Pattern matching is heavily involved in computing of Big Data and AI development, and training. In the age we live now, it is also required to analyze large number of data in a computationally efficient manner, i.e., with a low space and time complexity.
To compare large documents for similarity, a commonly used technique is to represent the documents as sets of letters, words or sub sentences and measure the similarity between them using the Jaccard Index.