/fast_align

Simple, fast unsupervised word aligner

Primary LanguageC++Apache License 2.0Apache-2.0

fast_align

fast_align is a simple, fast, unsupervised word aligner.

If you use this software, please cite:

The source code in this repository is provided under the terms of the Apache License, Version 2.0.

Input format

Input to fast_align must be tokenized and aligned into parallel sentences. Each line is a source language sentence and its target language translation, separated by a triple pipe symbol (|||). An example is as follows.

doch jetzt ist der Held gefallen . ||| but now the hero has fallen .
neue Modelle werden erprobt . ||| new models are being tested .
doch fehlen uns neue Ressourcen . ||| but we lack new resources .

Compiling and using fast_align

fast_align requires only a C++ compiler; it can be compiled by typing make at the command line prompt.

Run fast_align to see a list of command line options. Here is an example invocation:

./fast_align -i text.fr-en -d -o -v > forward.align