/termsuite-istex

Java and command line TermSuite launchers for terminology extraction against ISTEX API

Primary LanguageJavaApache License 2.0Apache-2.0

A TermSuite launcher on ISTEX documents.

Command Line

  1. Download last termsuite-istex's jar,
  2. Run istex launcher:
$ java -cp termsuite-istex-1.1.2.jar \
      fr.univnantes.termsuite.istex.cli.IstexLauncher \
      -t /path/to/tagger \
      -l en \
      --tsv istex-termino.tsv \
      --doc-id F697EDBD85006E482CD1AC91DE9D40F6C629727A,15101397F055B3A872D495F7405D0A3F3E195E0F

Selecting documents

Exactly one option in --doc-id or id-file must be passed.

  • --id-file FILE: A file containing the list of ISTEX document ids of the corpus
  • --doc-id STRING: The ","-separated list of ids of ISTEX documents

Outputting the extracted terminology

At least one option in --tsv, --json, --tbx must be passed.

  • --json FILE: Outputs terminology to JSON file
  • --tbx FILE: Outputs terminology to TBX file
  • --tsv FILE: Outputs terminology to TSV file

Other options

Many additional configuration options are available (TSV output configuration, filtering, extraction pipeline configuration, etc). All options available for TermSuite script TerminoExtractorCLI are also available with IstexLauncher. See official TerminoExtractorCLI documentation for details.

Run with docker

The main advantage of using docker container for termsuite-istex is that you don't need to install and configure any external tagger anymore.

See termsuite-istex-docker for more information.

Java API

to come