/CATENA

Primary LanguageJava

CATENA

CAusal and TEmporal relation extraction from NAtural language texts

CATENA is a sieve-based system to perform temporal and causal relation extraction and classification from English texts, exploiting the interaction between the temporal and the causal model. The system requires pre-annotated text with EVENT and TIMEX3 tags according to the TimeML annotation standard, as these annotation are used as features to extract the relations.

###Requirements

  • Java Runtime Environment (JRE) 1.7.x or higher

#####Text processing tools:

#####Other libraries:

#####Other resources:

  • Temporal and causal signal lists, available in resource/. This folder must be placed within the root folder of the project.
  • Classification models, available in models/, including: catena-event-timex.model, catena-event-dct.model, catena-event-event.model and catena-causal-event-event.model.

###Usage ! The input file(s) must be in the TimeML annotation format !

usage: Catena
 -i,--input <arg>        Input TimeML file/directory path
        
 -x,--textpro <arg>      TextPro directory path
 -l,--matelemma <arg>    Mate tools' lemmatizer model path   
 -g,--matetagger <arg>   Mate tools' PoS tagger model path
 -p,--mateparser <arg>   Mate tools' parser model path      
 
 -t,--ettemporal <arg>   CATENA model path for E-T temporal classifier    
 -d,--edtemporal <arg>   CATENA model path for E-D temporal classifier                       
 -e,--eetemporal <arg>   CATENA model path for E-E temporal classifier
 -c,--eecausal <arg>     CATENA model path for E-E causal classifier
 
 -b,--train              (optional) Train the models
 -m,--tempcorpus <arg>   (optional) TimeML directory path for training temporal
                         classifiers
 -u,--causcorpus <arg>   (optional) TimeML directory path for training causal
                         classifier     

The output will be a list of temporal and/or causal relations, one relation per line, in the format of:

<filename>	<entity_1>	<entity_2>	<TLINK_type/CLINK/CLINK-R>
  
  TLINK_type			One of TLINK types according to TimeML, e.g., BEFORE, AFTER, SIMULTANEOUS, etc.
  CLINK					entity_1 CAUSE entity_2
  CLINK-R				entity_1 IS_CAUSED_BY entity_2

###System architecture

alt tag

CATENA contains two main modules:

  1. Temporal module, a combination of rule-based and supervised classifiers, with a temporal reasoner module in between.
  2. Causal module, a combination of a rule-based classifier according to causal verbs, and supervised classifier taken into account syntactic and context features, especially causal signals appearing in the text.

The two modules interact, based on the assumption that the notion of causality is tightly connected with the temporal dimension: (i) TLINK labels for event-event pairs, resulting from the rule-based sieve + temporal reasoner, are used for the CLINK classifier, and (ii) CLINK labels are used as a post-editing method for correcting the wrongly labelled event pairs by the Temporal module.

#####Publication Paramita Mirza and Sara Tonelli. 2016. CATENA: CAusal and TEmporal relation extraction from NAtural language texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, December. [pdf]

#####Dataset

  • Training data for the Temporal module is taken from the TempEval-3 shared task, particularly the combination of TBAQ-cleaned (English training data) and TE3-platinum (English test data).
  • Training data for the Causal module is Causal-TimeBank, the TimeBank corpus annotated with causal information.
  • TimeBank-Dense corpus is used in one of the evaluation schemes for temporal relation extraction.
  • Causal-TempEval3-eval.txt (available in data/) is used in one of the evaluation schemes for causal relation extraction.

! Whenever making reference to this resource please cite the paper in the Publication section. !

###Web Service Soon!

###Contact For more information please contact Paramita Mirza (paramita135@gmail.com).