sh1r0/caffe-android-demo

Inconsistent NN output on Snapdragon chipset (+ others)

woodthom2 opened this issue · 2 comments

We added a loop around the call to the NN in MainActivity.java, and switched the NN call to getConfidenceScore in order to get the NN vector back into the Java.
We found that the values returned by the NN are sometimes inconsistent from a previous call with the same input data.
This behaviour is very frequent on HTC One M9 which uses Snapdragon chipset.
On other devices it sometimes occurs but to a lesser degree.
After several calls to the NN, the output vector is sometimes quite far from its correct value.
Sometimes the network starts returning NaN. At this point it is in a state where it does not return to processing correctly, and only returns NaN thereafter. We have only observed the NaNs on HTC One M9. Other devices have shown less frequent deviations in the output vector values, however they haven't returned NaN.

 @Override
        protected Integer doInBackground(String... strings) {
            startTime = SystemClock.uptimeMillis();
            float[] lastScores = new float[0];
            for (int i = 0; i < 20; i++) {
                System.out.println("Iteration " + i);
                float[] scores = caffeMobile.getConfidenceScore(strings[0]);
                for (int j = 0; j < lastScores.length; j++) {
                    if (scores[j] != lastScores[j]) {
                        System.out.println("Found inconsistency: " + i + " " + j + ", " + scores[j] + " " + lastScores[j] + " "  + Arrays.asList(scores) + " " + Arrays.asList(lastScores));
                    }
                }
                lastScores = scores;
            }

            return caffeMobile.predictImage(strings[0])[0];
        }

Here is the output that comes from the network over a few loops:

Iteration Value of first index of vector
0 3.1437678E-5
1 2.0848423E-5
2 3.1718344E-5
3 3.204372E-5
4 3.1441232E-5
5 3.11287E-5
6 3.1445536E-5
7 3.0625208E-5
8 3.1417538E-5
9 3.1448155E-5
10 NaN
11 NaN
sh1r0 commented

Hi @woodthom2, I think it's an issue related to Eigen (sh1r0/caffe-android-lib#57). The prebuilt libs in this repo were built with Eigen (see #28 (comment)), and I did not find such issues with OpenBLAS.

Thank you. I will check and reply on that thread