ashwhall/dsnt

Expected ground truth labels range unclear

Opened this issue · 5 comments

Hi,
Thanks for this TF implementation. By looking at the code and functions' docstrings, it is unclear to me what the expected range for the ground truth coordinates is: in the guassian generation function, the coordinates are clearly expected to be between 0 and 1, whereas in the computation of dsnt_x and dsnt_y, the coordinate values are beteween -1 and 1. This will produce outputs coordinates between -1 and 1. So, what range should I pick for my labels ?
Best regards,
Pierre

I used [0, 1] for my labels, but as the dsnt layer produces predictions in [-1, 1], I modify the outputs of the dsnt layer.

Perhaps looking at https://github.com/ashwhall/dsnt/blob/master/dsnt.py would be helpful to explain what I mean.

Hi, thanks for your answer. From what I understand by reading your code, the outputs are in [-1; 1], which means one needs to resample them between 0 and 1 when computing the loss with labels in [0, 1]. Is that right ?

The only thing I do not understand is why doing
x0 = centre[0] - 0.5
y0 = centre[1] - 0.5
in the gaussian supervision map computation.

I changed the code for my own use in two ways: I compute the outputs between 0 and 1 directly by changing the range of the dsnt_x and dsnt_y to [0; 1] and I added the replacement of the Gaussian by a uniform map that sums to 1 when the label is outside of the image. Is there something that might break because of it?

It seems to me that you need to prepare the label in the range (-1, 1). Excluding -1 and 1 with the paper approach. But I think that these values can be changed to [-1, 1] (inclusive) or [0,1] inclusive without affecting anything also.

For example, I changed these lines:

dsnt/dsnt.py

Lines 26 to 30 in 2a6761a

# Build the DSNT x, y matrices
dsnt_x = tf.tile([[(2 * tf.range(1, width+1) - (width + 1)) / width]], [batch_count, height, 1])
dsnt_x = tf.cast(dsnt_x, tf.float32)
dsnt_y = tf.tile([[(2 * tf.range(1, height+1) - (height + 1)) / height]], [batch_count, width, 1])
dsnt_y = tf.cast(tf.transpose(dsnt_y, perm=[0, 2, 1]), tf.float32)

To be like this:

    # Build the DSNT x, y matrices
#     dsnt_x = tf.tile([[(2 * tf.range(1, width+1) - (width + 1)) / width]], [batch_count, height, 1])
    dsnt_x = tf.tile([[tf.range(0, width) / (width-1)]], [batch_count, height, 1])
    dsnt_x = tf.cast(dsnt_x, tf.float32)
#     dsnt_y = tf.tile([[(2 * tf.range(1, height+1) - (height + 1)) / height]], [batch_count, width, 1])
    dsnt_y = tf.tile([[tf.range(0, height) / (height-1)]], [batch_count, width, 1])
    dsnt_y = tf.cast(tf.transpose(dsnt_y, perm=[0, 2, 1]), tf.float32)

So that the tensor composes of numbers between 0 to 1 (inclusive) instead of -1 to 1 (exclusive) and it works for my use case.

Check my PR #8, it has an option for the user to specify 0to1 or -1to1.