/ixa-pipe-dep-eu

Euskaraz idatzitako testuetan dependentzia sintaktikoak etiketatzeko ixaKat tresna - http://ixa2.si.ehu.es/ixakat/ixa-pipe-dep-eu.php

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

[euskaraz]

ixa-pipe-dep-eu

ixa-pipe-dep-eu is a dependency parser for Basque written documents. It is a tool of the ixaKat modular chain. It is based on the combination of the analyses obtained by different parsers. More precisely, Mate and MaltParser parsers are used to obtain the analyses, and MaltBlender tool is used to choose the best combination of those analyses.

The tool takes a document in NAF format. This input document should contain lemmas, PoS tags and morphological information. The input NAF document containing the necessary linguistic information could be obtained from the output of ixa-pipe-pos-eu.

Installation

There are two options in order to get this tool: get the source code and compile it or use the pre-compiled package. Anyway, some linguistic resources must be installed.

From source

Installing the ixa-pipe-dep-eu requires the following steps:

  • Install Java (JDK 1.7+)

  • Install Maven

  • Get module source code

    > git clone https://github.com/ixa-ehu/ixa-pipe-dep-eu.git
    
  • Compile

    > cd ixa-pipe-dep-eu
    > mvn clean package
    

This step will create a directory called target which contains various directories and files. Most importantly, there you will find the module executable:

ixa-pipe-dep-eu-2.0.0-exec.jar
  • Install the linguistic tools and resources as specified in this section.

Using the pre-compiled package

Instead of compiling from source, you can download the pre-compiled package that contains the executable file from the following link: ixa-pipe-dep-eu-v2.0.0.tgz

Decompress the package. The executable will be ready to use, without any installation, but you have to follow the steps in this section in order to install the linguistic tools and resources needed.

To run the tool, Java should be installed in your computer.

Installing the linguistic resources

Before starting using the tool, you have to follow the next steps in order to install the necessary resources and dependencies.

  • Download the package of the resources from the following link: dep-eu-resources-v2.0.0.tgz

  • Decompress the package and update the run.sh executable file changing the baliabideak variable to specify the path of the dep-eu-resources directory you just obtained.

How to use

The ixa-pipe-dep-eu-2.0.0-exec.jar executable is used to run the ixa-pipe-dep-eu tool. The only required argument (-b) is the path of the linguistic resources directory obtained in this section. The full command syntax of ixa-pipe-dep-eu-2.0.0-exec.jar is

> java -jar ixa-pipe-dep-eu-2.0.0-exec.jar [-h] -b RESOURCES_DIR [-c CONLL_FILE]

arguments:
 -h                show this help message and exit
 -b RESOURCES_DIR [Required] Specify the path of the downloaded resource directory
 -c CONLL_FILE    [Optional] If you want to save the output also in CONLL format, specify the path of the output file

A executable script run.sh is provided to run the ixa-pipe-dep-eu tool. You can use it, but before running it, update the rootDir and baliabideak variables on this script as specified in this section.

This tool reads from standard input. It should be UTF-8 encoded NAF format, containing lemmas, PoS tags and morphological annotations (text and terms elements of NAF). The input NAF document containing the necessary linguistic information could be obtained from the output of ixa-pipe-pos-eu.

Therefore, we can obtain syntactic dependencies of a plain text file using the following comand:

> cat test.txt | sh ixa-pipe-pos-eu/ixa-pipe-pos-eu.sh | sh ixa-pipe-dep-eu/run.sh

The output is written to standard output and it is in UTF-8 encoding and NAF format. In the NAF output document the syntactic dependencies will be marked by deps elements as it is shown in the example below:

<deps>
  <!--ncsubj(da, Zinemaldiko)-->
  <dep from="t6" to="t2" rfunc="ncsubj" />
  <!--ncmod(lehiatuko, sail)-->
  <dep from="t5" to="t3" rfunc="ncmod" />
  <!--ncmod(sail, ofizialean)-->
  <dep from="t3" to="t4" rfunc="ncmod" />
  <!--xpred(da, lehiatuko)-->
  <dep from="t6" to="t5" rfunc="xpred" />
</deps>

How to cite

If you use ixa-pipe-dep-eu tool, please cite the following paper in your academic work:

Iakes Goenaga, Koldo Gojenola, Nerea Ezeiza. Combining Clustering Approaches for Semi-Supervised Parsing: the BASQUE TEAM system in the SPRML 2014 Shared Task. Workshop on Statistical Parsing of Morphologically Rich Languages SPRML 2014 Shared Task, Dublin, COLING Workshop. 2014. (bibtex)

License

All the original code produced for ixa-pipe-dep-eu is licensed under GPL v3 free license.

This software uses a external tool, and it is distributed with the source code and the resources. This tool has its own license:

  • mate-tools anna: GNU General Public License, version 2

  • MaltParser: Copyright (C) 2007-2017, Johan Hall, Jens Nilsson and Joakin Nivre. Redistribution and use in source and binary forms, with or without modification, are permitted.

  • MaltOptimizer: Copyright (C) 2011, Miguel Ballesteros and Joakin Nivre. Redistribution and use in source and binary forms, with or without modification, are permitted.

Contact

Arantxa Otegi, arantza.otegi@ehu.eus
Iakes Goenaga, iakes.goenaga@ehu.eus