Model gives erroneous values as prediction
mikegazzaruso opened this issue · 4 comments
Hi, I trained my model with tf/keras.
Model:
model = Sequential()
### First Layer
model.add(Dense(100, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.3))
### Second Layer
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dropout(0.5))
### Third Layer
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dropout(0.5))
### Final Layer
model.add(Dense(num_labels))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')
I exported json weights with your script, this is the result when I import it:
# dimensions: 40
Layer: dense
Dims: 100
Layer: unknown
Dims: 100
Layer: unknown
Dims: 100
Layer: dense
Dims: 200
Layer: unknown
Dims: 200
Layer: unknown
Dims: 200
Layer: dense
Dims: 200
Layer: unknown
Dims: 200
Layer: unknown
Dims: 200
Layer: dense
Dims: 10
Layer: unknown
Dims: 10
First question is: are that "Unknown" networks ok? They should be Activation(s) and Dropout(s) layers.
However, later in C++ code:
auto model = RTNeural::json_parser::parseJson<double>(jsonStream, true);
const double testInput[] = { -378.06, 17.45, -49.15, 37.95, 10.41, 35.25, 8.45, 12.91, 0.70, 23.27, 7.05, 21.32, -3.18, 7.67, 2.08, 17.77, -0.00, 8.85, 0.94,6.91, 4.10, 10.23, 2.81, 9.09, 4.54, 7.57, 6.54, 8.60, 5.48, 5.16, 3.71, 6.40, 5.05, 4.76, 0.70, 0.68, 5.02, 1.84, 3.80, 4.50 };
model->forward(testInput);
const double* testOutput = model->getOutputs();
In testOutput I have first 10 values of the array populated with quite large double values (i.e. 5007.7344438 and so on), while the output predictions should be vector categorical class labels (i.e. [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]).
Inputting the same coefficients in Python and making inference outputs correctly the right class label.
What am I missing? My feelings are for those "Unknown" layers.
Thanks for your help and thanks for the librari.
Ok, I think I figured out.
Problem seems to be Dropout Layers and Activation Layers declared separately: RTNeural can't properly detect them type and mark it as "Unknown".
I rewrote and retrained my model using no Dropout Layers and specifying Activation iFunctions in Network's Layers directly.
Python/TF/Keras:
model = Sequential()
### First Layer
model.add(Dense(100, input_shape=(40,), activation='relu'))
#model.add(Activation('relu'))
#model.add(Dropout(0.3))
### Second Layer
model.add(Dense(200, activation='relu'))
#model.add(Activation('relu'))
#model.add(Dropout(0.5))
### Third Layer
model.add(Dense(200, activation='relu'))
#model.add(Activation('relu'))
#model.add(Dropout(0.5))
### Final Layer
model.add(Dense(num_labels, activation='softmax'))
#model.add(Activation('softmax'))
C++ RTNeural detected Network:
# dimensions: 40
Layer: dense
Dims: 100
activation: relu
Layer: dense
Dims: 200
activation: relu
Layer: dense
Dims: 200
activation: relu
Layer: dense
Dims: 10
activation: softmax
No "Unknown" Layers this time.
And now, when calling getOutputs(), prediction I get is something like this:
output[0]: something34383488e-13
output[1]: something8983488e-12
output[2]: something8433e-13
output[3]: 0.9999999999998
output[4]: something3849843e-13
...... etc.
and this seems to be correct to me, because right class label vector for this inference should be:
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
(tested in Python)
So I think the problem is not regarding Dropout Layers, but the fact you should pass dropout value (and activation) directly in one node's layer.
What do you think about?
Thank you
Hi again!
So for the Dropout
layer, I think the best thing to do is pass that to the layers_to_skip
argument in the save_model
function. The reason there being that Dropout is typically only active when a model is being trained, and since RTNeural is only performing inference, it doesn't really need to know about the Dropout layers. Maybe we should add keras.layers.Dropout
as one of the default layers to skip?
For the Activation layers, I typically use those in my TensorFlow code similar to your second example, but it would be nice if RTNeural could support ways of writing the code in TensorFlow. I'll have to think about it a bit more, but I think there's a way to make it work.
I just passed Dropout and Activation layers directly in my layer's declaration, no problem this way.
You did a great work with the library, Sir.
Kudos!
By the way, with e2ec8ea
, RTNeural can now load models using activations as in your initial example.
Anyway, closing for now since i think this is solved.