CTranslate is a C++ implementation of OpenNMT's translate.lua
script with no LuaTorch dependencies. It facilitates the use of OpenNMT models in existing products and on various platforms using Eigen as a backend.
It only supports CPU translation of OpenNMT models released with the release_model.lua
script.
Eigen
> 3.3
Compiling executables additionally requires:
Boost
(program_options
)
CMake and a compiler that supports the C++11 standard are required to compile the project.
git submodule update --init
mkdir build
cd build
cmake -DEIGEN_ROOT=<path to Eigen library> -DCMAKE_BUILD_TYPE=<Release or Debug> ..
make
It will produce the dynamic library libonmt.so
(or .dylib
on Mac OS, .dll
on Windows), the translation client cli/translate
. CTranslate also bundles the OpenNMT's Tokenizer which provides the tokenization tools lib/tokenizer/cli/tokenize
and lib/tokenizer/cli/detokenize
.
- To compile only the library, use the
-DLIB_ONLY=ON
flag. - To disable OpenMP, use the
-DWITH_OPENMP=OFF
flag.
- Compile in release mode (
-DCMAKE_BUILD_TYPE=Release
) - Unless you are cross-compiling for a different architecture, add
-DCMAKE_CXX_FLAGS="-march=native"
to thecmake
command above to optimize for speed. - Consider using Intel® MKL if available. You should follow Eigen instructions to link against it.
See --help
on the clients to discover available options and usage. They have the same interface as their Lua counterpart.
This project is also a convenient way to load OpenNMT models and translate texts in existing software.
Here is a very simple example:
#include <iostream>
#include <onmt/onmt.h>
int main()
{
// Create a new Translator object.
auto translator = onmt::TranslatorFactory::build("enfr_model_release.t7");
// Translate a tokenized sentence.
std::cout << translator->translate("Hello world !") << std::endl;
return 0;
}
For a more advanced usage, see:
include/onmt/TranslatorFactory.h
to instantiate a new translatorinclude/onmt/ITranslator.h
(theTranslator
interface) to translate sequences or batch of sequencesinclude/onmt/TranslationResult.h
to retrieve results and attention vectorsinclude/onmt/Threads.h
to programmatically control the number of threads to use
Also see the headers available in the Tokenizer that are accessible when linking against CTranslate.