/TEI

TimeBankPT Event Identification

Primary LanguageJupyter NotebookOtherNOASSERTION

TEI

TEI - TimeBankPT Event Identification Open In Colab

Docker

DESCRIPTION

TEI is an event trigger identifier system for sentences in the Portuguese language. It locates the event trigger terms in a sentence. The model was trained on the TimeBankPT (COSTA; BRANCO,2012) corpus.

The system outputs the identified events in the following Json format:

[
    {
        "text": "Vazamentos",
        "start": 0,
        "end": 10
    },
    {
        "text": "expõem",
        "start": 20,
        "end": 26
    },
    {
        "text": "diz",
        "start": 62,
        "end": 65
    }
]

Local Execution

Prerequisites

  1. Download and place the BERTimbau Base (SOUZA; NOGUEIRA;LOTUFO, 2020) model and vocabulary file:
    $ wget https://neuralmind-ai.s3.us-east-2.amazonaws.com/nlp/bert-base-portuguese-cased/bert-base-portuguese-cased_tensorflow_checkpoint.zip
    $ wget https://neuralmind-ai.s3.us-east-2.amazonaws.com/nlp/bert-base-portuguese-cased/vocab.txt
    Then unzip and place it in the the models directory as follows:
    ├──models
    |      └── BERTimbau
    |               └── bert_config.json
    |               └── bert_model.ckpt.data-00000-of-00001
    |               └── bert_model.ckpt.index
    |               └── bert_model.ckpt.meta
    |               └── vocab.txt
    |
    |...
    
  2. Install the packages.
    $ pip install -r requirements.txt

OPTIONS

-h, --help                           Print this help text and exit
--sentence  SENTENCE                 Sentence string to identify events from
--dir   INPUT-DIR OUTPUT-DIR         Identify events from files of input directory
	                             (one sentence per line) and write output json
				     files on output directory.

EVENT IDENTIFICATION FROM A DIRECTORY OF FILES

The text files in the input directory are expected to have the format:

* all text files end with the extension .txt
* sentences are separated by newlines
$ python3 src/tei.py --dir /tmp/input-dir /tmp/output-dir

EVENT IDENTIFICATION FROM A SENTENCE

$ python3 src/tei.py --sentence 'Vazamentos de dados expõem senhas de funcionários do governo, diz relatório.'

How to cite this work

Peer-reviewed accepted paper:

  • Sacramento, A., Souza, M.: Joint Event Extraction with Contextualized Word Embeddings for the Portuguese Language. In: 10th Brazilian Conference on Intelligent System, BRACIS, São Paulo, Brazil, from November 29 to December 3, 2021.