
XSLT script extracting all motion and arrival verbs into separate untokenized document

Opened this issue · 0 comments

I need to get an aggregation of all instances of motion and arrival verbs into a single place without the tokenized text.

XSLT should:

  • Search for a list of key words in translations in following-sibling::spanGrp[@type='annotations'][1] and <xsl:for-each select="$annotations/span[@ana='#S' and @xml:lang='en']">
    key words should include: ("come", "coming", "came", "comes", "go", "went", "goes", "going", "arrive", "arrives", "arrived", "got there", "got here", "went home", "go home", "come home", "came home",.....)

  • For each sentence whose following-sibling::spanGrp[@type='annotations'][1] contains one of the key words:
    - copy at <span type="S"> level (remove spaces)
    - (one tab over) copy span[@ana='#S' and @xml:lang='en']

  • Can manually refine the results in a spreadsheet