/negations

Primary LanguagePython

Negations: extended work

Duplicating sentences with multiple negations in the Spanish and French corpora.

Spanish corpus SFU ReviewSP-NEG

one_sent_SFU2MultiBERT.py: is written as a test script for only one file from the corpus. It extracts annotations the way I did it in my thesis. It does not split sentences with multiple negations.

one_sent_duplicate.py: I try to extract anno and duplicate sentences using ElementTree. I iterate through each sentence as many times as there are negation structures in total. I also order negation structures by counting them incrementally. If the order of a negation structure == the order of iteration, I collect the anno for the cue.

PROBLEM: .getroot() uses actual file as a root. I cannot find a way to use sentence node as a root.

process_top_level_negs_SP.py: A copy of try_this.py which processes the entire Spanish corpus. Currently, it counts the number of top level neg_structures and creates as many copies of a sentence as there neg_structures.

Current stats:

  • 3076 all_neg_sents
  • 4113 newly collected sents
  • 4327 all_neg_structures
  • 4327 nested_neg_sents
  • 214 lost neg_structures
  • (in my thesis I had collected 2197 OneScope sents)