/Author_Multilevel_Ngram_Profiles

Multilevel N-grams for authorship attribution and profiling

Primary LanguageR

Multilevel N-grams for authorship attribution and profiling

This is an R script for calculating sequential word and character n-gram vectors in a corpus of texts. These vectors are used for training SVM and Random Forests MC algorithms in order to perfrom authorship attribution or author profiling classification. AMNP method has been described in the following papers:

  • Mikros, G. K., & Perifanos, K. (2011). Authorship identification in large email collections: Experiments using features that belong to different linguistic levels Proceedings of PAN 2011 Lab, Uncovering Plagiarism, Authorship, and Social Software Misuse held in conjunction with the CLEF 2011 Conference on Multilingual and Multimodal Information Access Evaluation, 19-22 September 2011, Amsterdam.

  • Mikros, G. K., & Perifanos, K. (2013). Authorship attribution in Greek tweets using multilevel author’s n-gram profiles. In E. Hovy, V. Markman, C. H. Martell & D. Uthus (Eds.), Papers from the 2013 AAAI Spring Symposium "Analyzing Microtext", 25-27 March 2013, Stanford, California (pp. 17-23). Palo Alto, California: AAAI Press.