githubharald/CTCDecoder

difference between this repo and ctc decoder in tensorflow

cosimo17 opened this issue · 4 comments

Thanks for your code.
I test my sequence with both ctc decoder in tensorflow and this repo.I always get different result. Tensorflow is always right.This repo sometimes return right and sometimes return wrong.
Have you ever compare these two implementation?

I compared my C++ implementation (basically the same as the Python code in this repo) with the TF beam search decoder and got roughly the same accuracy / char error rate.
When using a language model, my beam search decoder usually outperformed the one from TF.

  • Did you use the same beam width?
  • Did you feed the output of the neural network with softmax already applied into my decoder?
  • Did you disable the language model (lm=None)?

You can provide a sample for which results are worse than in TF, then I can have a look. Provide the input to the decoder (e.g. for the matrix you can use numpy.save) and also say what the expected result is.

Oh.Thanks for your quick reply.

  1. I use the same beam width.(set to 25)
  2. Whether I use softmax or not.It is still diffrent with TF.
  3. lm is set to None.

This is a sample matrix given by network with shape [60,1,64].
(np.npy in zip file)
test_matrix.zip

(This is the shape format needed by tf ctc decoder. You need to squeeze the axis 1 before feed to your beam search algorithm).

The result given by tf(it is ok):
[56 29 34 49 45 51 5 49 32 47 21 24]

The result given by ctc decoder in this repo(21,19,48,three redundant chars had been reported):
(56, 29, 34, 49, 45, 51, 5, 49, 32, 47, 21, 24, 21, 19, 48)

The result given by ctc decoder in thsi repo(add softmax function to sample matrix axis 2):
(56, 29, 34, 49, 45, 51, 5, 49, 32, 47, 21, 24, 59, 48)

CTC decode algorithm is not familiar to me. I just use it as a black box. Hoping for your help. Thanks.

Hi,

just compared output of my beam search decoder with the one from TF - results are the same:

My result: [56, 29, 34, 49, 45, 51, 5, 49, 32, 47, 21, 24, 59, 48]
TF result: [56, 29, 34, 49, 45, 51, 5, 49, 32, 47, 21, 24, 59, 48]
Is equal: True

Code to re-produce:

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from BeamSearch import ctcBeamSearch


def softmax(mat):
	"calc softmax such that labels per time-step form probability distribution"
	maxT, _ = mat.shape # dim0=t, dim1=c
	res = np.zeros(mat.shape)
	for t in range(maxT):
		y = mat[t, :]
		e = np.exp(y)
		s = np.sum(e)
		res[t, :] = e/s
	return res

def compare_decoders():
	mat_tbc = np.load('test_matrix.npy')
	mat_sm_tc = softmax(mat_tbc[:,0,:])

	# my decoder
	fake_classes = [chr(65+i) for i in range(mat_sm_tc.shape[1]-1)]
	my_res = ctcBeamSearch(mat_sm_tc, classes=fake_classes, beamWidth=25, lm=None)
	my_labels = [fake_classes.index(c) for c in my_res]
	print('My result:', my_labels)

	# TF decoder
	tensor_mat = tf.placeholder(tf.float32, mat_tbc.shape)
	tensor_ctc = tf.nn.ctc_beam_search_decoder(tensor_mat, [mat_tbc.shape[0]], beam_width=25, merge_repeated=False)
	sess = tf.Session()
	tf_res = sess.run(tensor_ctc, {tensor_mat: mat_tbc})
	tf_labels = tf_res[0][0].values.tolist()
	print('TF result:', tf_labels)
	print('Is equal:', tf_labels==my_labels)

	# plot matrix
	plt.imshow(mat_sm_tc)
	plt.xlabel('chars')
	plt.ylabel('time')
	plt.show()


if __name__ == '__main__':
	compare_decoders()

Here is what your input matrix looks like. Seems like in your code you are somehow ignoring the last few time-steps (marked red): there are some non-blank characters recognized at the end of the sequence.
Maybe caused by your sequence_length input parameter to the TF decoder.

Figure_1

Oh. I understood.I made a low-level mistake.
The seq-length param is set to 30 and 30 is equal to my first version network output.
I forgot to change it when I change the actual length of network output from 30 to 60.
Your code is right. Thanks for your help. Now I can continue my work.