/telenovela-transcripts

Analyze language used in Spanish-language novelas using corpus linguistics tools

Primary LanguagePython

Analyze Spanish-Language Telenovela Transcripts

This Python code can be used to create .txt files out of .htm pages (batch downloaded with DownThemAll) and extract out the text contained in the <p> tags, so it is best suited for transcripts of episodes of Univision telenovelas. The .txt files can be then analyzed in AntConc for most frequent words or phrases using the Clusters/N-Grams option.

Watch the instructional video, with sample results of frequency analysis in the description: Using Python and AntConc to Analyze Spanish-Language Telenovela Transcripts

Update: Univision's website has been changed and it does not appear that transcripts can be downloaded from the captions anymore.