Question : Output on softmax with 10 units but in train cycle uses one number ?
Closed this issue · 3 comments
Hello,
I was loking to the code and congrats! It looks good and it is what I was looking for. There are not many examples with AC-GANs.
There is one part I did not understand in total.
Normally when doing classification we have a softmax output with size equal to the number of classes (like here in the discrimator softmax out layer).
And with this we have to perform hot encoding to transform the label into a distribution of probabilities to be "compatible" with the softmax layer... like class 0 would be the vector [1,0,0,0,0,0,0,0,0,0] class 1 - [0,1,0,0,0,0,0,0,0,0] ...
On the train you give the class directly with out hot encoding in :
epoch_gen_loss.append(combined.train_on_batch(
[noise, sampled_labels.reshape((-1, 1))], [trick, sampled_labels]))
the var sampled_labels has the number of the class directly (with out hot encoding) but the model has 10 units for this variable :
aux = Dense(10, activation='softmax', name='auxiliary')(features)
return Model(input=image, output=[fake, aux])
The code works so, What I am I missing ?
Hot encoding is not necessary or is there but I did not see ?
thanks
If you look here you can see sparse_categorical_crossentropy
, which uses the integer rather than one-hot encoding.
Hello,
Thanks a lot. I did not notice that detail (never use the "sparse" version :-) )
just one other thing to confirm:
When creating the input for the generator, the label is used to "give color to noise" so it can guide the "latent" to the creation of the right class, correct ? this is done multiplying the random noise with the label, right ?
with this code :
# this is the z space commonly refered to in GAN papers
latent = Input(shape=(latent_size, ))
# this will be our label
image_class = Input(shape=(1,), dtype='int32')
# 10 classes in MNIST
cls = Flatten()(Embedding(10, latent_size,
init='glorot_normal')(image_class))
# hadamard product between z-space and a class conditional embedding
h = merge([latent, cls], mode='mul') ## <-- HERE
fake_image = cnn(h)
return Model(input=[latent, image_class], output=fake_image)
That is correct, yes. This isn't canonical (if i'm not mistaken, the original paper just adds a one-hot vector to the latent space) but I found this to be more intuitive & to work a bit better.