/tensorflow-with-kenlm

Tensorflow with KenLM integrated for beam search scoring

Primary LanguageC++Apache License 2.0Apache-2.0



-----------------

Tensorflow with KenLM integration

This fork of tensorflow adds KenLM (a language model) to the ctc_beam_search_decoder operation.

tf.nn.ctc_beam_search_decoder(logits,
                              output_sequence_lengths,
                              kenlm_directory_path='your/directory/path')

Your specified kenlm_directory_path must contain three files

kenlm-model.binary
vocabulary
trie

See http://kheafield.com/code/kenlm/ to find out how to generate your kenlm-model.binary.

The vocabulary file contains the mapping from your logit labels to characters, the file should contain all allowed characteres in a single line, the indexing specifying the respective label id, e.g.

abcdefghijklmnopqrstuvwxyz '

The trie is generated from a text corpus of all words on a character level. Given a file corpus.txt which must satisfy the following conditions,

  • only contains words with characters specified in vocabulary
  • seperated by whitespace or new lines

we can generate trie using:

cd tensorflow-with-kenlm
bazel build -c opt --config=cuda //tensorflow/core/util/ctc:ctc_generate_trie
bazel-bin/tensorflow/core/util/ctc/ctc_generate_trie kenlm-model.binary vocabulary < corpus.txt > trie

How to compile tensorflow

See Download and Setup for more detailed instructions.

./configure
bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-*.whl --upgrade

Linux CPU Linux GPU Mac OS CPU Windows CPU Android
Build Status Build Status Build Status Build Status Build Status

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

If you'd like to contribute to TensorFlow, be sure to review the contribution guidelines.

We use GitHub issues for tracking requests and bugs, but please see Community for general questions and discussion.

Installation

See Installing TensorFlow for instructions on how to install our release binaries or how to build from source.

People who are a little more adventurous can also try our nightly binaries:

Try your first TensorFlow program

$ python
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> sess.run(hello)
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> sess.run(a+b)
42
>>>

For more information

The TensorFlow community has created amazing things with TensorFlow, please see the resources section of tensorflow.org for an incomplete list.