/joshua

Joshua Statistical Machine Translation Toolkit

Primary LanguageJavaOtherNOASSERTION

Running the Joshua Decoder:
---------------------------

If you wish to run the complete machine translation pipeline, Joshua includes a
black-box implementation that enables the entire pipeline to be run by typing
a single restartable command.  See the documentation for a walkthrough and more
information about the many options available to the pipeline.

   - web:           http://joshua-decoder.org/5.0/pipeline.html 
   - local mirror:  ./joshua-decoder.org/5.0/pipeline.html

Manually Running the Joshua Decoder:
------------------------------------

To run the decoder, first set these environment variables:

    export JAVA_HOME=/path/to/java  # maybe /usr/java/home
    export JOSHUA=/path/to/joshua

You might also find it helpful to set these:

    export LC_ALL=en_US.UTF-8
    export LANG=en_US.UTF-8

Then, compile Joshua by typing:

    cd $JOSHUA
    ant

The basic method for invoking the decoder looks like this:

    cat SOURCE | JOSHUA -c CONFIG > OUTPUT

You can test this using the sample configuration files and inputs can be found 
in the example/ directory.  For example, type:

    cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config

The decoder output will load the language model and translation models defined
in the configuration file, and will then decode the five sentences in the
example file.

There are a variety of command line options that you can feed to Joshua.
For example, you can enable multithreaded decoding with the -threads N flag:

    cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config -threads 5

The configuration file defines many additional parameters, all of which can be
overridden on the command line by using the format -PARAMETER value.  For
example, to output the top 10 hypotheses instead of just the top 1 specified in
the configuration file, use -top-n N:

    cat examples/example/test.in | $JOSHUA/bin/decoder -c examples/example/joshua.config -top-n 10

Parameters, whether in the configuration file or on the command line, are
converted to a canonical internal representation that ignores hyphens,
underscores, and case.  So, for example, the following parameters are all
equivalent:

  {top-n, topN, top_n, TOP_N, t-o-p-N}
  {poplimit, pop-limit, pop-limit, popLimit}

and so on.  For an example of parameters, see the Joshua configuration file
template in $JOSHUA/scripts/training/templates/tune/joshua.config or the online
documentation at joshua-decoder.org/4.0/decoder.html.  There is a wealth of
information in the online documentation.

After you have successfully run the decoding example above, we recommend that
you take a look at the Joshua pipeline script, which allows you to do full 
end-to-end training of a translation model.  It is stored in

    $JOSHUA/examples