/nafil

A program for performing bilingual corpus filtering

Primary LanguageC++OtherNOASSERTION

This is a repository for some tools to handle parallel corpora:

  • nafil: Performs sentence filtering for noisy corpora
  • namone: Trains IBM model one
  • nabss: Performs bilingual sentence selection

The method for "nafil" is inspired by the following paper:

Improving Machine Translation Performance by Exploiting Non-Parallel Corpora Dragos Stefan Munteanu, Daniel Marcu Computational Linguistics 2005

The method for "nabss" is inspired by the following paper:

Does more data always yield better translations? Guillem Gasco ́, Martha-Alicia Rocha, Germa ́n Sanchis-Trilles, Jesu ́s Andre ́s-Ferrer and Francisco Casacuberta EACL 2012