/khmelev03

Khmelev, D.V., & Teahan, W.J. (2003). A repetition based measure for verification of text collections and for text categorization. In Proceedings of the 26th ACM SIGIR, (pp. 104–110).

Primary LanguageC++

khmelev03

Khmelev, D.V., & Teahan, W.J. (2003). A repetition based measure for verification of text collections and for text categorization. In Proceedings of the 26th ACM SIGIR, (pp. 104–110).

How to start the program:

The program needs the corpora in the format of the "PAN"-comptetition.

To get it running, you have to start it via command prompt with the parameters -i path/to/training/corpus -o path/to/output/directory.

The algorithm has been improved, so it will give the answer "candidate00000", if the real author can't clearly be determined.