Genre Complexity

Code for experiments in Poetry, Songs, Literature, Legalese and Translationese: Automated Sentence Complexity Perspective.


Although non-trivial to measure, natural texts come in varying complexities. As a result, multiple domains and genres can be compared based on their complexities. In this study, focused on measuring sentence complexity, I use automated methods of complexity estimation to compare poetry, natural prose, literary prose and machine and human translation. The conclusion is that old poetry and old literature is more complex than their modern counterparts, as measured by language model complexity, Flesch Reading Ease and syntactic depth. Furthermore, we observe that machine translations are faithful to human references in terms of sentence complexity, which is a positive result for the translation industry. Most importantly, this paper discusses the reason for different complexities across varying text domains, which is framed as ``form (complexity) follows function and aesthetics with least effort.''

Average of sentence complexity metrics across textual genres and domains. Lower values
mean simpler texts.