ml5js/ml5-next-gen

`ml5.neuralNetwork` reorders labels that are numerical strings

Closed this issue · 3 comments

Hi all,

I've been trying to train a MNIST handwritten digit classifier with ml5.neuralNetwork using numerical strings like "1", "2", "3" as labels. The trained neural network seems to classify inputs completely randomly even though the loss is converging.

After hours of debugging, I realized the neural network would behave as expected if I change my labels to be non-numerical strings like "Digit 1", "Digit 2", "Digit 3". I did a few more tests with the neuralNetwork color classifier example and found out that numerical strings seem to be reordered when added as outputs, hence producing inaccurate classifications after training.

I haven't had a chance to look at the source code, but I suspect this might have something to do with how JavaScript treats numerical strings?

Hi @jackbdu, thank you for submitting this issue!

I looked into this a bit, and I think you are right about the numerical strings being treated differently.

Here is the code mapping the unformatted classifications to the ml5 output format:

const label = Object.keys(meta.outputs)[0];
const vals = Object.entries(meta.outputs[label].legend);

const formattedResults = unformattedResults.map((unformattedResult) => {
  return vals
    .map((item, idx) => {
      return {
        [item[0]]: unformattedResult[idx],
        label: item[0],
        confidence: unformattedResult[idx],
      };
    })
    .sort((a, b) => b.confidence - a.confidence);
});

Here is the metadata object when running your example (notice the legend is already displayed in numerical order by Chrome):

The mapping assumes the Object.entries array will return an array in insertion order. However, for modern versions of JS, the Object.entries orders the array like the for...in... loop, which sorts non-negative integers in ascending order.

I will do a little more testing to make sure this is the issue and come up with a fix. Will keep you updated!

Hi @jackbdu, I opened a PR to address this bug! After I sort the legend by their one-hot encodings, the bugs disappear.

@ziyuan-linn Thank you for addressing this so quickly! Your solution looks good to me!