/python-translate

Word translation using python

Primary LanguagePython

Cheap Translation using Python and Lexicons

This code will translate an input file using a dictionary. The input file must be in the CoNLL column format. For example, see eng.conll.

Requirements

  • python 3
  • swig-srilm wrapper
  • (Optional) gensim (if you want to use the word vector expansion part)
  • (Optional, but recommended) Language model created by SRILM.

Here's the simplest possible way to make a language model (<input file> is just a text file):

$ ngram-count -text <input file> -lm <output file>

See ngram-count for more documentation.

Usage

To translate eng.conll from English (eng) into Turkish (tur):

$ python translate.py -i eng.conll -o tur.conll -t tur

eng.conll is inculded in the repository. tur.conll is produced when this is done. Notice that the -s argument is not needed, because English is the default source.

To translate interactively (from English, to Turkish):

$ python translate.py -t tur

There are some config variables in the utils.py. Be sure to set these

Paper

This code was used in the paper: 'Cheap Translation for Cross-Lingual Named Entity Recognition' in EMNLP2017. See here for details.

Lexicons

In the paper, we used lexicons from the Masterlex project at University of Washington. These are not yet available for public release, but in the meantime, we recommend using the excellent set of lexicons from Ellie Pavlick et al described in this paper, and available for download here.

The Pavlick dictionary set is the default setting in utils.py

Google Translate API Client

senttrans.py will translate on a sentence level using the Google Translate API. This code is somewhat out of date, and may not work any more.

The Google Translate API Client can be a little confusing. See this page for ideas on how to use the library. You will need an API key, which will cost you money.

Installation:

$ pip install --upgrade google-api-python-client

Here's a useful example.