harthur/brain

Output is NaN

alexnix opened this issue ยท 18 comments

I train my network on a set of data containing car data (year of fabrication, mileage, type, model as input and price as output). I try to predict price for another car but output is NaN. NaN is not even among the values in the training set so this seams like an issue with the brain module.

My code is on GitHib Gist, here: https://gist.github.com/alexnix/146fea914501d283c80635087dd87036

@alexnix - Your input type and model are non-numeric values. You must normalize your data first. I suggest one-hot encoding them: https://github.com/nickpoorman/one-hot

One-hot encoding for type and model?

I just learnt from YouTube Video that input values must be in [-1,1]. I am still wondering how to represent model and type, I could just label them with numbers like Audi 0.1, Opel 0.2 and so on but this will make the network find Audi and Opel similar because their labels are close to each other. Is there a god way to represent inputs with discrete, not correlated values? (such as car model, in my example)

Disclaimer: noob in AI/ML/NN here.

@alexnix - Yes one-hot encoding solves the issue of having a feature(s) with "categorical" values. In your case for example, type will be expanded from a vertical column in the matrix, of "Mazda", "Ford", "Volkswagen", "Renault", "Kia", "Hyundai", etc... to horizontal, with a boolean flag of 1 if it is that type.

For example: Your first input is: { year: 2009, mileage: 311000, type: "Mazda", model: "CX-7" }

So it will become something like:

[2009, 311000, 1, 0, 0, 0, 0, 0, ...]

Where the header for the columns might be:

[year, mileage, type_mazda, type_ford, type_volkswagen, type_renault, type_kia, type_hyundai, ...]

You might also want to normalize your values between 0 and 1 for this library. Get the min and max value for each column and then scale them between 0 and 1. https://github.com/nickpoorman/scale-number-range

Thank you for your advice, it was very useful indeed.

Is this issue resolved?

Dok11 commented

@nickpoorman, can I ask you?
You wrote:

Where the header for the columns might be:
[year, mileage, type_mazda, type_ford, type_volkswagen, type_renault, type_kia, type_hyundai, ...]

Is it mean what topicstarter must set and "model" by your example? Like this:
{model__maxda_cx_7: 1, model__bmw_x5: 0, model__nissan_xtrail: 0,...}

It's so many columns.. that's normal?

@Dok11, brain.js only allows for numeric values as inputs. One-Hot encoding allows you to transform Y-axis values into X-axis inputs with "ON or OFF" values.

This will naturally increase the dimensionality of the the inputs, so yes you will always end up with more columns. To reduce the number of columns, run your data set through dimensionality reduction via PCA, Lasso, or some other means.

This brain library is probably not what you want for highly dimensional data. Try using a library that can do matrix transforms quickly via BLAS or some other more efficient means.

@nickpoorman & @Dok11, the new repository does do matrix transforms via the recurrent neural net...
Example from: BrainJS/brain.js@338cf70#diff-04c6e90faac2675aa89e2176d2eec7d8R25

//create a simple recurrent neural network
var net = new brain.recurrent.RNN();

net.train([{input: [0, 0], output: [0]},
           {input: [0, 1], output: [1]},
           {input: [1, 0], output: [1]},
           {input: [1, 1], output: [0]}]);
	
var output = net.run([0, 0]);  // [0]
output = net.run([0, 1]);  // [1]
output = net.run([1, 0]);  // [1]
output = net.run([1, 1]);  // [0]

Dok11 commented

@robertleeplummerjr, that's cool, but not for this task, right?
p.s. ye, I do very similar nn, and this ask very interest for me ;)

Dok11 commented

pps. Where I can see more examples?
https://github.com/harthur-org/brain.js/wiki is empty...

Dok11 commented

@nickpoorman: This will naturally increase the dimensionality of the the inputs, so yes you will always end up with more columns. To reduce the number of columns, run your data set through dimensionality reduction via PCA, Lasso, or some other means.

What about set dictinary as:
['cx-7', 'x5', 'x-trail', ...]
and use keys in input array like:
{... type: 0, ...}

Will this right work?

I have the same problem, but all of the input signals are already normalize:

const neural = require('../NeuralNetwork').toFunction() // In this directory are stored neural network and an array of learning
neural({
  albums: 0.011111111111111112,
  videos: 0.016523867809057527,
  audios: 0,
  notes: 0,
  photos: 0.00035337249878528203,
  friends: 0.009302790837251175,
  mutual_friends: 0,
  followers: 0.007113002799187086,
  subscriptions: 0,
  pages: 0.0063083522583901085,
  wall: 0.0005448000778285826
}) // { '0': NaN }

An example of learning sample:

{
  "input":{
    "albums":0,
    "videos":0.002345981232150143,
    "audios":0,
    "notes":0,
    "photos":0.019921374619020275,
    "friends":0.06461938581574472,
    "mutual_friends":0,
    "followers":0.004280263813796541,
    "subscriptions":0,
    "pages":0.0010093363613424174,
    "wall":0.22041054577293512
  },
  "output":[0]
}

When training, I use only one an element of the training sample - then the network will take you back a numerical result. But if you use at least 2 Elements - the result is not a number

Full train array in .json
All data were normalized using scale-number-range

@cawa-93 - I would have to take a look at the rest of your code - the setup of the network and how you are training the model. Another thing you should try is not using category mode. Simply supply your input vector as an array. Instead of this:

{
  "input":{
    "albums":0,
    "videos":0.002345981232150143,
    "audios":0,
    "notes":0,
    "photos":0.019921374619020275,
    "friends":0.06461938581574472,
    "mutual_friends":0,
    "followers":0.004280263813796541,
    "subscriptions":0,
    "pages":0.0010093363613424174,
    "wall":0.22041054577293512
  },
  "output":[0]
}

do this:

{
  "input":[
    0,
    0.002345981232150143,
    0,
    0,
    0.019921374619020275,
    0.06461938581574472,
    0,
    0.004280263813796541,
    0,
    0.0010093363613424174,
    0.22041054577293512
  ],
  "output":[0]
}

I've been using this in production for three years (training millions of models and making billions of predictions monthly), I assure you there is nothing wrong with the library.

@nickpoorman I create simple repository for you cawa-93/user-scaner

I noticed if the train network objects, the numerical data stored in net.json, however, if the I train arrays, all values = Null

Your letting negative values return from scaleNumberRange which I assume is your means of normalizing values.

/**
 * simple module to scale a number from one range to another
 */
var debug = require('debug')('scale-number-range');

module.exports = function scaleNumberRange(number, oldMin, oldMax, newMin, newMax) {
  if (process.env.SCALE_THROW_OOB_ERRORS) {
    if (number < oldMin) {
      debug('ERROR OOB - scale(%d, %d, %d, %d, %d)', number, oldMin, oldMax, newMin, newMax);
      throw new Error('number is less than oldMin');
    }
    if (number > oldMax) {
      debug('ERROR OOB - scale(%d, %d, %d, %d, %d)', number, oldMin, oldMax, newMin, newMax);
      throw new Error('number is greater than oldMax');
    }
  }
  const result = (((newMax - newMin) * (number - oldMin)) / (oldMax - oldMin)) + newMin;
  console.log(result);
  return result;
}

Outputs:

$ babel-node --presets es2015-node ./test
-1
-0.9953080375356997
-1
-1
-0.9601572507619595
-0.8707612283685106
-1
-0.9914394723724069
-1
-0.9979813272773151
-0.5591789084541298
{ '0': NaN }

If I *= -1 result, I still get NaN, so still investigating.

@cawa-93 - Two issues with your code. One you should filter out any user data that doesn't have the same shape. The following is going to cause issues.

{
  "id": 305576398,
  "counters": {
    "unknown": true
  }
}

To do this use a filter:

const learnArray = users
.filter(user => {
  for (let key in maxRages) {
    if (typeof user.counters[key] === 'undefined') {
      return false
    }
  }
  return true
})
.map(user => {
  let result = {
    input: {},
    output: []
  }

  for (let c in user.counters) {
    if (c !== 'messages' && c !== 'online_friends') {
      result.input[c] = scale(user.counters[c], 0, maxRages[c], 0, 1)
    }
  }

  result.output.push(user.counters.messages > 3 ? 1 : 0)

  return result
})

Also, you should scale to [0, 1] instead of [-1, -1].

Lastly, instead of using toFunction(), you should just use run to solve your NaN problem.

I've updated some of the code in this gist: https://gist.github.com/nickpoorman/cd9465edca726df8dc06dbdd2937d153

lol, beat me to it! In all fairness, I was getting a haircut.