/contemplata

Syntactic/temporal web annotation tool

Primary LanguageElm

Contemplata

Contemplata is a web-based annotation tool developed specifically for the purpose of the Temporal@ODIL project. The ultimate goal of this project is to annotate a portion of the ANCOR spoken French corpus with semantic (more precisely, temporal) information. To this end, Contemplata allows:

  • Merging/splitting speech turns into syntactically coherent units,
  • Removing (either automatically or manually) selected expressions, uninteresting from the semantic point of view (e.g., social obligations-related expressions)
  • Correcting constituency trees, obtained with a syntactic parser plugged into Contemplata (it can be thus used as a purely syntactic annotation tool, regardless of its temporal annotation functionnalities),
  • Annotating temporal entities on top of the syntactic structures,
  • Linking the entities with temporal relations.

Installation

First clone the Contemplata's repository into a local directory.

git clone https://github.com/kawu/contemplata.git
cd contemplata

Then proceed with the installation of the back-end server, the front-end annotation tool, and (optionally) the third-party syntactic analysis tools, as explained below.

Back-end

To install the back-end, you will need to download and install the Haskell Tool Stack on your machine beforehand. You can use the latest stable version of the tool.

Then, move to the backend directory and run the installation process with stack.

cd backend
stack install
cd ..

Under linux, this command will (by default) install the contemplata-server command-line tool in the ~/.local/bin directory. You can either add this directory to your $PATH, or use the full path to run contemplata-server:

~/.local/bin/contemplata-server --help

Protocol buffers

If you encounter the following error during compilation:

protoc: callProcess: runInteractiveProcess: exec: does not exist

Then you need to install Protocol Buffers and retry with stack install.

Avoid recompilation of protocol buffer files

By default, the setup tool will generate Haskell files from the protocol buffer files (responsible for communication with the Stanford parser) each time you run stack install. However, this step needs to be performed only once. In order to skip it for subsequent builds, replace:

buildProtos :: Bool
buildProtos = True

with:

buildProtos :: Bool
buildProtos = False

in the Setup.hs file.

Front-end

To install the front-end application, you will need to install Elm beforehand.

WARNING: Contemplata requires Elm version 0.18 (and not the latest version 0.19 which introduced several breaking changes). Elm 0.18 can be installed with npm using the following command:

npm install -g elm@0.18

Once you have Elm installed, move to the annotool directory and generate the JavaScript application file.

cd annotool
elm-make src/Main.elm --output=main.js
cd ..

The --output option tells the compiler to generate a main.js JavaScript file rather than a stand-alone HTML file. You will then need to put the main.js file into a directory in which the web-server is run, as explained in the setup section below.

Third-party

You can optionally install one or both constituency parsers supported by Contemplata. This will allow the annotators to run these parsers directly via the annotation interface.

Stanford

Let $corenlp be the directory in which you wish to put the Stanford CoreNLP tool. You can download the tool from the CoreNLP's webpage.

cd $corenlp
wget http://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip
unzip stanford-corenlp-full-2017-06-09.zip

Next, you will need to obtain an appropriate parsing model. Currently, Contemplata is configured to work with the French models only (we plan to allow other languages in future versions). These models are also available at the CoreNLP's website.

wget http://nlp.stanford.edu/software/stanford-french-corenlp-2017-06-09-models.jar

Finally, you can run the CoreNLP server, supplying it with (i) the path to CoreNLP (ii) the French models:

cd $contemplata/corenlp
./stanford-server-fr.sh $corenlp/stanford-corenlp-full-2017-06-09 $corenlp/stanford-french-corenlp-2017-06-09-models.jar

See also the README file for information about the CoreNLP French parsing model prepared within the context of the Temporal@ODIL project.

DiscoDOP

TODO

Usage

Before you can start the Contemplata application, you will need to set up an instance with its own dedicated database and configuration files.

Setup

You will need to prepare a dedicated enviroment to run Contemplata, i.e., a dedicated directory where the database and all the configuration files are stored. Under linux, assuming that $odil is the path to the dedicated directory, and that $contemplata is the path to the cloned Contemplata's repository, you can run the following commands to create an empty database in $odil's' DB subdirectory.

mkdir $odil
cd $odil
contemplata createdb -d DB

Then you can copy the (a) initial configuration files, (b) webserver templates, and (c) the JavaScript file generated with Elm (see the front-end section), using the following commands:

cp -r $contemplata/config/* ./
cp -r $contemplata/backend/snaplets ./
cp -r $contemplata/annotool/main.js resources/public/

You can read more about configuration in the corresponding README file.

Running

Use the following command to run the web-server in the $contemplata directory.

contemplata-server

By default, the application uses the port 8000. You can change it using the -p option.

contemplata-server -p 8000

At this point, you can access the annotation tool via http://localhost:8000 (assuming that you performed the steps described in the setup section).

To start annotating, you will have to log in as administrator (login = admin, password = admin), change the password, create annotator accounts, upload files, and assign the files to the individual annotators, as explained below.

Administration

After you setup a local Contemplata instance and run the corresponding web-server, you will need to log in at http://localhost:8000 as an administrator to prepare the annotation enviroment. Initially, login = admin and password = admin. You can change the password straight away at the Password subpage (reachable via the top navigation bar).

Annotators

At first, two Contemplata accounts are set up: admin and guest. Both accounts are intended for special use-cases: admin for administravie tasks, guest to give access to non-annotators to selected documents and to the Contemplata's user guide.

You can add actual annotator accounts via the Users subpage, which contains the list of the current annotators and a form to add new annotators.

Passwords

Forgotten passwords cannot be restored, but as an administrator you can change the password of an existing user. To this end, go to the Users subpage and use the form which also serves to add new annotators.

Upload

Initially, the annotation database is empty. To add new files for annotation, use the form present at the Upload subpage.

WARNING: upload only works with UTF-8-encoded files.

File IDs

When you upload a file, you need to specify the name of the file which consists of three parts:

  • The base name under which the file will be stocked in the database.
  • The annotation level of the file, which allows to distinguish the various copies of (originally) the same file annotated at different levels (syntax, semantic, etc.). The set of levels can be specified in Contemplata's Dhall configuration, you can change them to serve your annotation needs better.
  • The ID of the file, to distinguish several copies of the same file annotated at the same level. You can use fill it, e.g., with the name of the file's annotator.

Contentionally, Contemplata uses the BASE-NAME:LEVEL:ID format (i.e. with all the parts of the name separated with :) to refer to the file with the corresponding BASE-NAME, LEVEL, and ID.

Formats

At the moment, two upload formats are supported: generic JSON files, respecting the appropriate formatting rules, and the corpus format (.ac XML files) of the GLOZZ annotation platform, which is also handled by the ANNODIS annotation tool.

For the latter format, the tool automatically performs certain pre-processing operations. Notably, it removes the social obligations-related expressions, a step which can be avoided by unchecking the corresponding checkbox during the file's upload.

Files

The list of files stocked in the database can be found at the Files subpage. Click on the file of your choosing to see more information about it, assign annotators to it, download its JSON representation, and so on.

Assign annotators

The list of the annotators having access to the file can be found in the Annotators section of the corresponding subpage. Each annotator can either read or read-and-write the file. To change the annotator's modification rights, click on the corresponding link in the Can modify? column. You can also add new annotators for the file using the form below, or remove the annotator from the file using the remove link.

Download JSON

The Show JSON link, which allows to download the JSON version of the annotated file, can be found in the General information section.

Copy

The Copy form, which allows to create a copy of the file, can be found at the bottom of the subpage. It can be useful, e.g.:

  • To create a copy of the file for another user to annotate.
  • When annotation of the file at a given level (e.g., syntax) is finished and you want to create a copy to annotate higher levels (e.g., semantic).
Remove

The Remove link, which allows to completely remove the file from the database, can be found in the General information section.

Status

Each file in the database is assigned a status, which tells whether the file is:

  • new -- freshly added to the database
  • touched -- its annotation has been commenced
  • done -- its annotation (at the given level) has been finished

Normally, the status of the file is updated automatically, based on the actions of its annotator(s). The aministrator can nevertheless change it manually, by clicking on the corresponding link in the General information section.

Command-line tool

The Contemplata application suite provides the contemplata command-line tool, by default installed in the ~/.local/bin directory. It can be used to create a new database, add new files to the database, convert an FTB file to the PTB format, etc. Run:

contemplata -h

to see the tool's available options.

Architecture

Contemplata is implemented in a client/server architecture, with the advantage that the annotator does not have to install anything locally, and the server can provide the user with more advanced functionality. For instance, the server can be requested to syntactically re-analyze a given sentence in a way which takes the constraints specified directly by the annotator (e.g. a particular tokenization) into account. In the long run, the client/server architecture should also allow a more collaborative annotation style.

On the server-side, Contemplata tool uses a simple file-based storage for the annotated files. All the files are kept in the dedicated JSON format.

The web-server is implemented in Snap, a Haskell web framework. It handles regular HTTP requests (used to list the files, general administration work, etc.) as well as WebSocket requests, the latter used to communicate with the front-end annotation application.

The front-end is implemented in Elm, a Haskell-like language which compiles to JavaScript, thus the tool can be used in any modern internet browser. Being a high-level language, Elm allows to implement sophisticated annotation-related functionality relatively quickly.

An Temporal@ODIL-dedicated instance of the tool can be found at http://vega.info.univ-tours.fr/odil/current. You can log in as a guest (password guest) to have a look. As a guest, you will not be allowed to store any changes you made, but you will have access to the user's guide and will be able to play with the tool's functionality.

Format

All the files in the database are stored in a dedicated JSON format. This format is determined automatically on the basis of the corresponding File data type.

You can think of the File type as a definition of the structure against which the JSON files can be validated. You can perform the validation programatically. First run stack ghci within the backend source directory and then:

import qualified Data.Aeson as JSON
import qualified Data.ByteString as BS
JSON.decodeStrict <$> BS.readFile "<path-to-json>" :: IO (Maybe File)

JSON from PTB

Contemplata provides a command-line tool which allows to convert a file in the PTB bracketed format to the dedicated JSON format. So if you want upload a file for annotation, it might be more convenient to prepare it in the PTB format, covert as shown below, and upload via the web-interface afterwards.

contemplata penn2json < <file.ptb> > <file.json>