Analyze Spanish-Language Telenovela Transcripts
This Python code can be used to create .txt files out of .htm pages (batch downloaded with DownThemAll) and extract out the text contained in the <p>
tags, so it is best suited for transcripts of episodes of Univision telenovelas. The .txt files can be then analyzed in AntConc for most frequent words or phrases using the Clusters/N-Grams option.
Watch the instructional video, with sample results of frequency analysis in the description: Using Python and AntConc to Analyze Spanish-Language Telenovela Transcripts
Update: Univision's website has been changed and it does not appear that transcripts can be downloaded from the captions anymore.