This is a repository for some tools to handle parallel corpora:
- nafil: Performs sentence filtering for noisy corpora
- namone: Trains IBM model one
- nabss: Performs bilingual sentence selection
The method for "nafil" is inspired by the following paper:
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora Dragos Stefan Munteanu, Daniel Marcu Computational Linguistics 2005
The method for "nabss" is inspired by the following paper:
Does more data always yield better translations? Guillem Gasco ́, Martha-Alicia Rocha, Germa ́n Sanchis-Trilles, Jesu ́s Andre ́s-Ferrer and Francisco Casacuberta EACL 2012